All editions

Hey there, today's issue is all about the nuts and bolts of AI development, with a focus on the technical challenges and innovations that are driving progress in the field. From new frameworks for evaluating agent-native memory systems to the introduction of open-source AI frameworks for production-ready agents, it's clear that researchers and developers are working to build more efficient, effective, and flexible AI systems.

One of the key themes that emerges from today's stories is the importance of careful evaluation and testing in AI development. Whether it's assessing the performance of large language models or evaluating the accuracy of text-to-image generation systems, it's clear that rigorous testing and validation are essential for building trust in AI systems. At the same time, there are also signs of tension and controversy in the AI community, with accusations of illicit model extraction and debates over the role of ethics in AI development.

Overall, today's issue suggests that the AI field is continuing to evolve and mature, with a growing focus on the technical and practical challenges of building reliable, efficient, and effective AI systems.

🛠️ Build

Researchers Propose Framework For Evaluating Agent-Native Memory Systems

Researchers propose a framework for evaluating agent-native memory systems, decomposing them into four core modules: memory representation and storage, extraction, retrieval and routing, and maintenance. They evaluate 12 representative memory systems and two reference baselines across five benchmark workloads spanning 11 datasets, finding that no single architecture dominates across all scenarios. The study reveals cost-performance trade-offs under realistic workloads, showing localized maintenance is more cost-efficient than global reorganization. The code is publicly available on GitHub.

Researchers propose Implicit Visual Chain-of-Thought for structure-aware text-to-image generation

Researchers propose Implicit Visual Chain-of-Thought, a latent visual reasoning framework for query-conditioned image generation, which decomposes visual conditioning queries into a structural-to-semantic cascade. The framework achieves superior results on GenEval and T2I-CompBench benchmarks. The proposed approach uses training-only sketch supervision to guide structural queries, encouraging them to capture structure from sketches without requiring sketch extraction or intermediate decoding at inference time. This method performs implicit chain-of-thought reasoning in a single forward pass.

Researchers investigate how reasoning unlocks parametric knowledge in large language models

Researchers at an unnamed institution published a study titled Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs, which explores the phenomenon of how generating step-by-step reasoning traces enhances performance on complex tasks in large language models. The study focuses on simple, single-hop factual questions and demonstrates that allowing a model to generate a reasoning trace unlocks correct answers that are otherwise unreachable. The researchers conducted a series of hypothesis-driven controlled experiments using the Gemini-2.5 and Qwen3-32B models, and found two complementary mechanisms driving this phenomenon: a computational buffer effect and factual priming. The computational buffer effect refers to the model using extra tokens as a computational buffer to perform latent processing, while factual priming refers to the model generating facts topically related to the question to facilitate retrieval of the correct answer.

Researchers introduce FLAT for geometrically accurate scene generation

Researchers introduce FLAT, a method for decoding explicit surface primitives directly from latent space, enabling high-quality 3D scene generation with improved geometric accuracy and real-time rendering capabilities. FLAT achieves significantly better geometric accuracy while maintaining competitive visual quality compared to state-of-the-art feedforward baselines. The method uses a ray-centered rotation parameterization for triangle regression and a novel product window function to improve gradient flow during differentiable triangle rendering. A lightweight test-time refinement step converts the predicted triangle soup into a fully opaque, game-engine-ready representation that supports real-time rendering.

DeepPavlov team releases Haystack, an open-source AI framework for production-ready agents

DeepPavlov team releases Haystack, an open-source framework for building production-ready AI agents, featuring advanced RAG pipelines with hybrid retrieval and self-correction loops. Haystack supports multimodal AI, conversational AI, and content generation, with a standardized interface for generators and Jinja-2 templates for prompt flow. The framework is designed for scalable context engineering and complex decision flows, with branching and looping pipelines. Haystack's architecture allows for tasks like image processing and audio transcription, in addition to text-based applications.

Talos Team Develops Automated Genomic Reanalysis Tool For Rare Disease Diagnosis

Talos team developed an open-source tool for automated, iterative reanalysis of genomic data in rare disease, efficiently re-examining stored sequencing data as scientific knowledge evolves and flagging variants with newly actionable evidence. Across a validation set of nearly 1,100 patients, Talos recovered 90% of in-scope diagnoses while flagging only 1.3 candidate variants per patient for expert review. Deployed across a prospective cohort of almost 5,000 undiagnosed patients, Talos delivered 241 new diagnoses, with an average of only 32 days passing between supporting evidence becoming public and the resultant diagnosis. Talos is deliberately conservative, optimized to return a small set of high confidence variants rather than a long ranked list. The tool uses newly discovered information to tag and filter variants, then refines the candidate set using family structure and, when available, the patient’s phenotype.

🚀 Launches

Researchers introduce iLLaDA, an 8B masked diffusion language model

Researchers presented iLLaDA, an 8B masked diffusion language model trained from scratch with fully bidirectional attention, outperforming autoregressive counterparts on various benchmarks. iLLaDA was trained on 12T tokens and fine-tuned on a 25B-token instruction corpus for 12 epochs, achieving improvements of 21.6 points on BBH and 14.9 points on ARC-Challenge. The model remains competitive with Qwen2.5 7B on several benchmarks despite its non-autoregressive training. Model weights and codes are available on GitHub. The researchers used variable-length generation for efficiency and introduced confidence-based scoring for multiple-choice evaluation.

Google's DeepMind team integrates computer use in Gemini 3.5 Flash

Google's DeepMind team has integrated computer use as a built-in tool in Gemini 3.5 Flash, enabling developers to build custom agents that can interact across platforms. Previously available as a standalone model, computer use is now native to the main Gemini Flash model, delivering improved performance for long-horizon and enterprise automation tasks. Mateo Quiros, Product Manager at Google DeepMind, announced the update, which unlocks reliable agent building across browser, mobile, and desktop environments. Developers can access computer use in 3.5 Flash via the Gemini API and Gemini Enterprise Agent Platform. The integration also includes targeted adversarial training and optional enterprise safeguard systems to mitigate prompt injection risks.

Z.ai releases GLM-5.2 model with improved benchmark scores

Z.ai released their latest model, GLM-5.2, on June 13th to GLM Coding Plan members, with the official MIT-licensed model weights and release blog dropping three days later. The model has shown better-than-expected results in community benchmarks, including Arena's agent leaderboard, where it is the only open model competing with OpenAI and Anthropic's latest models. GLM-5.2 has also been praised by the AI commentariat and researcher class after personal use, with many considering it a credible alternative to Anthropic's Claude Code. The model's capabilities have been demonstrated in coding harnesses as a general agent, with some minor issues reported, such as compatibility problems with Fireworks API.

OpenAI and Broadcom unveil LLM-optimized inference chip Jalapeño

OpenAI and Broadcom unveiled Jalapeño, an accelerator architected around OpenAI’s vision for the future of LLM inference, with early testing showing performance per watt substantially better than current state-of-the-art. Developed from design to production in nine months, Jalapeño is designed to work with all LLMs and is part of OpenAI’s strategy to build the full stack behind its models and products. The chip is designed with flexibility and reduces data movement to achieve realized utilization closer to theoretical peak performance. OpenAI’s President Greg Brockman stated that Jalapeño is part of their long-term full-stack infrastructure strategy to make compute more abundant and AI faster, more reliable, and more affordable.

🔥 Buzz

Google and Microsoft hire philosophers to tackle AI ethics

Google and Microsoft are hiring philosophers to address the ethical implications of artificial intelligence, as the technology raises complex questions about its impact on society. This shift in hiring trends is a notable change from a decade ago, when arts and humanities students were advised to learn coding skills to become employable. Now, programmers are concerned about AI potentially replacing their jobs. The move to hire philosophers is an attempt to tackle the thorny problems posed by AI, such as its potential consequences on employment and its ethical considerations.

Cory Doctorow publishes The Reverse Centaur’s Guide to Life After AI

Cory Doctorow released his new book, The Reverse Centaur’s Guide to Life After AI, which serves as a follow-up to his previous work, Enshittification. Doctorow discusses the concept of a reverse centaur, a person serving as a peripheral to an uncaring machine, and expresses alarm at the hype surrounding AI. He argues that the AI industry is creating more reverse centaurs and that the bubble will eventually pop, causing catastrophic economic consequences. Doctorow also shares his ideas on how to push back against the prevailing narrative of AI’s inevitability.

📈 Business

Qualcomm to acquire Modular

Qualcomm announced plans to acquire Modular, a move that expands its capabilities in the AI sector. The terms of the deal were not disclosed. Qualcomm's acquisition is expected to enhance its position in the market, with Modular's technology likely to be integrated into Qualcomm's existing product lines. The acquisition is subject to regulatory approval and is expected to close in the coming months. Qualcomm's CEO noted the importance of the acquisition in a statement, highlighting the potential benefits for the company's customers.

🛡️ Safety

Anthropic accuses Alibaba of illicitly extracting Claude AI model capabilities

Anthropic alleges Alibaba extracted capabilities from its Claude AI model without permission, with the incident reportedly occurring in recent months. Anthropic's security team claims to have detected unusual activity related to the extraction. The company has not disclosed the extent of the extracted capabilities or the methods used by Alibaba. Anthropic's CEO has stated that the company is taking steps to prevent similar incidents in the future. Alibaba has not commented on the allegations.

I'll be back tomorrow with another issue, covering the latest developments in AI and tech. Until then, I hope you find something interesting to read in today's stories.

End of edition · 2026-06-25