Worth Knowing
New Releases from Google, OpenAI, and Meta: The recent pace of model releases hasn’t slowed down in recent weeks, with some of the biggest players in AI — Google, OpenAI, and Meta — releasing new models and features:
- Google released Gemini 2.5, its most powerful model yet. A reasoning model like OpenAI’s o1 and o3 models, Gemini 2.5 Pro Experimental (the first model from the family to be released), scores highly on a number of benchmarks and is the new top model on the popular LMArena leaderboard. Soon after OpenAI debuted ChatGPT, Google executives reportedly declared “code red” and channeled significant resources into developing and releasing new AI models and tools. More than two years on, those investments are bearing serious fruit: between Gemini 2.5 Pro Experimental, its faster Gemini 2.0 Flash Thinking, Gemini 2.0 Flash, and Gemini 2.0 Flash Lite models, Google now has the best-performing models at a range of price points. But while that’s nothing to scoff at, the company is still having more trouble than it would like turning performance into market share. With OpenAI rumored to be closing in on 1 billion active users, Google staged a shakeup on its Gemini leadership team, elevating Josh Woodward — who oversaw the launch of one of Google’s biggest AI successes, Notebook LM — to lead the Gemini effort.
- Off the heels of its GPT-4.5 launch last month, OpenAI continued shipping with a number of noteworthy releases, including: vastly improved image generation, improved memory in ChatGPT, the public release of its o3 and o4-mini models, and a new series of GPT-4.1 models for its API users. One of the key takeaways from its slew of releases is that OpenAI, like Google, clearly sees the consumer market as the strategic center of gravity. While the Gemini 2.5 Pro announcement could have stolen some of OpenAI’s thunder — even after the o3 launch, Google still has the top model on the LMArena leaderboard — OpenAI upstaged Google by releasing its improved image generation the very same day (thus flooding the internet with Studio Ghibli-inspired images). But even if they help attract and retain customers, the new models aren’t necessarily better across the board. Some users say ChatGPT now flatters them too much, and OpenAI’s own model card notes that o3 and o4-mini hallucinate more often than the company’s earlier reasoning model, o1.
- Meta announced its Llama 4 family of models, but the rollout was marred by erratic answers and claims that Meta juiced performance to rank highly on public leaderboards. The new model family comes in three sizes: Scout, which runs on a single Nvidia H100 and boasts an industry-leading 10 million-token context window; Maverick, the mid-range workhorse model; and Behemoth, a two-trillion-parameter “teacher” model that Meta says outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro but remains in training for now. Both Scout and Maverick models were released as open-weights models and are available for download through Meta or Hugging Face. At launch, Meta boasted impressive scores for Maverick: on the LMArena leaderboard, it initially had a score of 1417, good enough for second place at the time (now it would be third, behind OpenAI’s o3 model). But it appears that score was for a version of Maverick “optimized for conversationality” — not the one that was released to the public. LMArena has since removed that version of Llama 4 leaderboard and posted a statement on social media saying that “Meta’s interpretation of our policy did not match what we expect from model providers.” The publicly available version of Maverick remains on the leaderboard with a score of 1271, behind other open weights models like DeepSeek’s V3 (score of 1373) and Alibaba’s QwQ-32B (1314).
- More: OpenAI — Our updated Preparedness Framework | Google DeepMind — An Approach to Technical AGI Safety and Security
- More: TSMC Warns of Limits of Ability to Keep Its AI Chips From China | Exclusive: TSMC could face $1 billion or more fine from US probe, sources say
Government Updates
Looming Chip Tariffs Keep AI Sector on Edge Despite 90-Day Pause: After weeks of whiplash — blanket “reciprocal” duties on April 2, a 90-day pause on April 9, China-specific rates raised to 145 percent on April 10 (though President Trump indicated yesterday he may reverse course), and a blanket exemption for most electronic goods on April 11 — tariff chatter has come down a bit from its peak, but AI developers, chipmakers, and the market are still waiting for more shoes to drop. For the AI world, the looming worry is a planned tariff package on chips and the electronics that house them. In February, President Trump said semiconductor tariffs could start around 25 percent and “go very substantially higher over a course of a year.” On April 14, the administration launched its initial investigation — under Section 232 of the Trade Expansion Act of 1962 — that could formally lead to semiconductor tariffs on national security grounds. While those investigations typically take as many as 270 days to complete, Politico reports the administration is hoping to move faster. There are still questions about how such tariffs would be applied, but semiconductor expert Chris Miller thinks they could take the form of “component tariffs” — calculating the duty based on the value of foreign-made components in the entire final product, not just the final product itself. But as Miller points out, the complexity of semiconductor manufacturing means it’s unlikely the tariffs will bring chipmaking back to the United States in time to prevent significant price increases. Higher GPU and server costs would hit AI training budgets just as advertising-heavy giants like Google and Meta face softer consumer spending under broader tariffs. Whether the White House ultimately fine-tunes the plan is anyone’s guess; for now, the sector is stuck in wait-and-see mode.
Ambitious Tech Plan Clashes with Visa Crackdown and Funding Fights: In his first major speech since his Senate confirmation last month, OSTP Director Michael Kratsios promised to “secure America’s preeminence” in AI, quantum, and biotech through lighter regulation and “creatively allocating” federal R&D spending. The vision tracks closely with President Trump’s public March letter to Kratsios, which urged him to “blaze a bold path” and make the United States “the unrivaled world leader in critical and emerging technologies.” Kratsios inherits an innovation ecosystem already under stress from White House policies. Most notably, the administration’s approach to immigration has shown signs of disrupting a vital talent pipeline: a handful of high-profile detentions, denied entries, and revoked (then reinstated) visas involving scientists have some foreign researchers rethinking their U.S. work and travel plans, according to Nature. Even remote collaboration is under the microscope: in March, the White House Office of Management and Budget sent foreign grantees a questionnaire — asking, among other things, whether projects contain “DEI elements” — to gauge alignment with administration priorities. Immigration clamp-downs may restrict the talent pipeline, and funding pressures could hit those who stay: the administration’s high-profile fight with top universities has frozen billions in research grants, and a draft FY 2026 budget proposal circulated earlier this month signaled deep science cuts. Kratsios’ “Golden Age” could prove hard to usher in if the engines of U.S. research — its talent pipeline, collaborative ties, and federal funding base — falter.
U.S. Tightens AI Controls — Blacklists 80 Firms and Blocks Nvidia’s H20: The Trump administration took aim at China’s AI ecosystem with a handful of moves over the last month: first, the Commerce Department added 80 mostly Chinese firms to its Entity List. A few weeks later, the administration imposed an indefinite licensing requirement on Nvidia’s H20 and equally powerful chips. The entity-list expansion sweeps in server maker Nettrix, several Inspur affiliates (a major Intel customer), and the Beijing Academy of AI. The Nettrix addition points to the entity list’s shortcomings — now one of the biggest server companies in China, it was founded only a half decade ago by former executives from Sugon, a server company placed on the entity list in 2019. The department’s move to pull the plug on Nvidia’s China‑specific H20 accelerator could prove more impactful in the long run than entity list whack-a-mole. For Nvidia, the financial hit — a $5.5 billion writedown (plus the broader loss of access to the lucrative Chinese market) — is survivable, but the strategic damage could linger: with Chinese customers set to pivot to Huawei’s competing system (see above for more), one of Nvidia’s most important advantages — the ubiquity of its CUDA software — is harder to defend if one of the world’s largest AI markets is forced onto non‑Nvidia hardware. While there had been some speculation that Nvidia would skirt restrictions after CEO Jensen Huang attended a $1 million-a-head dinner with President Trump at Mar-a-Lago, the administration ultimately seems to have decided the risk wasn’t worth it. With the AI ecosystem moving toward reasoning-heavy models like OpenAI o3 and DeepSeek’s R1, a powerful inference chip like the H20 likely seemed too capable to send to a rival power.
In Translation
CSET’s translations of significant foreign language documents on AI
CSET’s translations of significant foreign language documents on AI
- Guide to the 2025 Annual Projects for the Major Research Program on Explainable and Generalizable Next-Generation Artificial Intelligence Methods
- Implementation Opinions of the National Development and Reform Commission and Other Ministries on Promoting the High-Quality Development of the Data Labeling Industry
- Measures for the Security Management of Facial Recognition Technology Applications
- Opinions on Strengthening the Governance of Science and Technology Ethics
- (Trial) Measures for Science and Technology Ethics Reviews
- Guidelines for National Data Infrastructure Construction
What’s New at CSET
ANNUAL REPORT
In 2024, CSET continued to deliver impactful, data-driven analysis at the intersection of emerging technology and security policy. Explore our annual report to discover key research highlights, expert testimony, and new analytical tools — all aimed at shaping informed, strategic decisions around AI and emerging tech.REPORTS
- AI for Military Decision-Making: Harnessing the Advantages and Avoiding the Risks by Emelia Probasco, Helen Toner, Matthew Burtell, and Tim G. J. Rudner
- Top-Tier Research Status for HBCUs?: The Impact of Changes to the Carnegie Classification Criteria in 2025 by Jaret C. Riddick and Brendan Oliss
- Government AI Hire, Use, Buy (HUB) Roundtable Series – Roundtable 1: Government as a User of AI by Carolina Oxenstierna, Alice Cao, and Danny Hague
- Government AI Hire, Use, Buy (HUB) Roundtable Series – Roundtable 2: Government as an Employer of AI Talent by Danny Hague, Carolina Oxenstierna, and Matthias Oschinski
- Government AI Hire, Use, Buy (HUB) Roundtable Series – Roundtable 3: Government as a Buyer of AI by Carolina Oxenstierna, Aaron Snow, and Danny Hague
- Government AI Hire, Use, Buy (HUB) Roundtable Series – Roundtable 4: Capstone by Danny Hague, Natalie Roisman, Matthias Oschinski, and Carolina Pachon
PUBLICATIONS AND PODCASTS
- Rising Tide: CSET’s Helen Toner launched her new Substack on April 1! Check out her first four posts: “Long” timelines to advanced AI have gotten crazy short, The core challenge of AI alignment is “steerability”, Nonproliferation is the wrong approach to AI misuse, and 2 Big Questions for AI Progress in 2025-2026
- CSET: How to Improve AI Red-Teaming: Challenges and Recommendations by Jessica Ji
- CSET: Exploring AI Methods in Biology Research by Steph Batalis, Catherine Aiken, and James Dunham
- Breaking Defense: Trump eliminated a key space advisory committee at the worst time by Kathleen Curlee
- Bulletin of Atomic Scientists: How to stop bioterrorists from buying dangerous DNA by Steph Batalis and Vikram Venkatram
- The National Interest: America Must Rebuild Its Physical Economy by Sam Bresnick and Jack Corrigan
- The National Interest: Artificial Intelligence, China, and America’s Next Industrial Revolution by Dewey Murdick and Bill Hannas
- TIME: The Case for a U.S.-Led Military Alliance in Space by Andrew Hanna and Kathleen Curlee
- ASPI: Stop The World Podcast: The road to artificial general intelligence with Helen Toner
- The Cognitive Revolution: OpenAI Reflections, Adaptation Buffers, and AI in Warfare with Helen Toner
- arXiv: The Impact of AI on the Cyber Offense-Defense Balance and the Character of Cyber Conflict by Andrew Lohn
EMERGING TECHNOLOGY OBSERVATORY
- DeepSeek energizes China’s chipmakers: editors’ picks from ETO Scout, volume 21 (2/27/25-3/31/25)
- Facial recognition regs, state and local docs, frontier frameworks: AGORA roundup #1
EVENT RECAPS
- On March 25, CSET hosted an in-depth discussion about AI red-teaming — what it is, how it works in practice, and how to make it more useful in the future. Watch a full recording of the event.
IN THE NEWS
- MIT Technology Review: Phase two of military AI has arrived (James O’Donnell cited the CSET report AI for Military Decision-Making)
- MSNBC: Trump can’t attack the Department of Education and expect to dominate China (Michele Norris cited the CSET report China is Fast Outpacing U.S. STEM PhD Growth)
- South China Morning Post: China’s PLA is using DeepSeek AI for non-combat support. Will actual combat be next? (Amber Wang quoted Sam Bresnick)
- TechTarget: AI companies claim existing rules can govern agentic AI (Makenzie Holland quoted Helen Toner)
- The New York Times: America’s Economic Exceptionalism Is on Thin Ice (Rebecca Patterson cited the CSET report China is Fast Outpacing U.S. STEM PhD Growth)
What We’re Reading
Report: The 2025 AI Index, the Stanford Institute for Human-Centered AI (April 2025)
Article: Driven to Self-Reliance: Technological Interdependence and the Chinese Innovation Ecosystem, Yeling Tan, Mark Dallas, Henry Farrell, and Abraham Newman, International Studies Quarterly (March 2025)
Paper: Do Large Language Model Benchmarks Test Reliability?, Joshua Vendrow, Edward Vendrow, Sara Beery, and Aleksander Madry (February 2025)