All editions

🛠️ Build

OpenAI expands Daybreak to democratize patching vulnerable software

OpenAI is expanding Daybreak to help democratize patching vulnerable software at machine speed, applying its models to discover and generate patches for critical vulnerabilities in major browsers, network infrastructure, and operating systems such as FreeBSD and the Linux kernel. The company is launching an update to the Codex Security plugin, which implements what it has learned from internal and customer usage of its models into a solution to accelerate the process of discovering and patching vulnerabilities. OpenAI is also launching the full version of GPT-5.5-Cyber, which sets new state-of-the-art performance on CyberGym, reaching 85.6% compared with 81.8% for GPT-5.5. More than 30 open-source projects have committed to participate in the Patch the Planet initiative, including cURL, Go, Python, Sigstore, and pyca/cryptography.

OpenAI's Jason Liu describes Codex strategies for long-running work

OpenAI's Jason Liu shares strategies for using Codex as a persistent workspace to support long-running projects, preserving context and managing complex workflows. Liu's whitepaper outlines approaches to break down ambitious goals into verifiable steps and determine when to delegate execution to Codex or maintain human oversight. The strategies aim to help organizations sustain progress across multiple workstreams and prompts.

Researchers Introduce EnterpriseClawBench Benchmark For Enterprise Agents

Researchers introduce EnterpriseClawBench, a benchmark for enterprise agents based on 852 reproducible tasks from real-world sessions, evaluating metrics beyond single performance scores. The benchmark is constructed from proprietary workplace sessions and includes recovered fixtures, rewritten prompts, and semantic rubrics. The best configuration, Codex with GPT-5.5, reaches a score of 0.663 on EnterpriseClawBench. The researchers emphasize the importance of reporting harness-model combinations, artifact delivery, and other metrics in enterprise agent evaluation.

Researchers introduce PlanBench-XL to evaluate long-horizon planning of LLM tool-use agents

Researchers introduced PlanBench-XL, a benchmark to evaluate large language model agents' ability to plan and adapt in complex tool-rich environments with limited visibility and dynamic disruptions. PlanBench-XL features 327 retail tasks over 1,665 tools and includes an optional blocking mechanism to simulate real-world unpredictability. Experiments on ten leading LLMs, including GPT-5.4, show that massive-tool planning remains challenging, with accuracy dropping to 11.36% under severe blocking conditions. The benchmark highlights the need for robust adaptive planning in long-horizon tasks with large, imperfect tool environments.

Daybreak launches Patch the Planet initiative to support open-source maintainers

Daybreak introduced Patch the Planet, an initiative built with Trail of Bits to help maintainers strengthen critical open-source software, pairing AI-assisted security research with expert human review to identify and patch vulnerabilities. The initiative aims to reduce the burden on maintainers by having security engineers review findings and work with projects to develop patches and tests. Initial participants include cURL, NATS Server, and the Go project, with additional projects to join in future rounds. Trail of Bits has already identified hundreds of security issues and merged dozens of patches using AI-assisted workflows. The team has also developed reusable security infrastructure, including fuzzing harnesses and differential-testing systems.

Researchers introduce KaLM-Reranker-V1 for efficient document reranking

Researchers present KaLM-Reranker-V1, a fast but not late-interaction reranker that decouples query and passage computation using encoder-decoder architecture with Matryoshka embedding pooling and cross-attention. KaLM-Reranker-V1 is built in three sizes, Nano, Small, and Large, with 0.27B, 1B, and 4B activated parameters, respectively. Extensive experiments on BEIR, MIRACL, and LMEB demonstrate that KaLM-Reranker-V1 achieves strong reranking performance with superior efficiency, including state-of-the-art performance on BEIR and excellent performance on MIRACL despite limited multilingual training data.

🚀 Launches

Nvidia Launches Halos, a Full-Stack Safety System for Autonomous Vehicles

Nvidia introduced Halos, a comprehensive safety system for autonomous vehicles, unifying safety elements across vehicle architecture, AI models, chips, software, tools, and services. The system covers the full development lifecycle with design-time, deployment-time, and validation-time guardrails, using three powerful computers: Nvidia DGX for model training, Nvidia Omniverse and Cosmos for simulation, and Nvidia DRIVE AGX for deployment. Nvidia Halos OS provides the unified software foundation necessary to bridge these AI capabilities with production-ready safety. The system complements existing industry-standard safety practices and introduces unique elements for autonomous vehicles, ensuring regulatory compliance and advancing safe and reliable AV stacks.

Z.ai releases GLM-5.2 model with 744B parameters and 1M context window

Z.ai's new open model GLM-5.2 delivers state-of-the-art performance across long-horizon coding, reasoning, and agentic tasks with 744B parameters and a 1M context window. The model can be run locally using Unsloth Dynamic GGUFs, with dynamic 1-bit and 2-bit quantizations reaching 76.2% and 82% top-1 accuracy respectively. GLM-5.2 is the strongest open model to date, performing on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and other benchmarks.

🛡️ Safety

Meta pauses AI training program tracking employee keystrokes after internal leak

Meta's Mark Zuckerberg has paused an internal AI training program after sensitive data was leaked, according to screenshots obtained by Business Insider, which showed employees' private conversations and performance data were exposed. The incident was classified as a SEV 2 on Meta's scale, with 0 being the most severe. A Meta spokesperson confirmed the incident and said the company is investigating, citing no indication of improper access by employees. The program, called the Model Capability Initiative, was announced in April and sparked a backlash from employees who felt uncomfortable with their data being recorded.

📈 Business

CleverCrow launches platform to fund open issues with community contributions

CleverCrow introduced a platform where users can pledge funds to back open issues in repositories, with the maintainer only receiving the funds once they start working on the issue. The platform charges a 10% fee on top of the token cost for each run, and any unspent funds are refunded to the backers. CleverCrow's model allows the community to fund the compute costs, rather than the repository owner, and provides a workflow that includes plan approval, draft PR, and settlement. The agent runs in a credential-less sandbox, ensuring a secure environment for the repository.

End of edition · 2026-06-23