It would seem Gemini 2.0 Thinking's spotlight was short-lived as OpenAI remains adamant on having this year's last "ho ho ho" as it showcased the unreleased potential of o1's successor - o3. Other key highlights include:
- OpenAI's GPT-5 project reportedly months behind schedules with current improvements not justifying the costs
- Elon Musk's xAI continues raking in investor money as it adds another $6 billion to the pile
- Donald Trump elects Indian-American entrepreneur Sriram Krishnan as his administration's senior AI policy advisor
Join us at AI Tangle as we untangle this week's happenings in AI!
|
OpenAI's long run of 12 days of shipmas finally came to a curtain close last Friday with the biggest present of them all, namely o3, a family of models to be the successors to the company's world-class reasoning model o1. Co-hosted by CEO Sam Altman, OpenAI showcased the frontier reasoning models during a company livestream, which included the fully-fledged o3 model and the more compact o3-mini for specialized tasks. Though OpenAI's claim of o3 being AGI may be subjective, it can't be denied that it has set the bar high for competitors
How much of a step up is o3 from o1?
By leveraging its private "chains of thought," the model's ability to "think" allows it to correct itself and avoid common AI pitfalls, o3 excels on benchmarks like SWE-Bench Verified (a 22.8-point improvement over o1) and Codeforces (2727 rating, equivalent to #175 on the leaderboard). The most impressive results were in ARC-AGI and especially EpochAI's Frontier Math benchmark, solving 25.2% of problems in the latter benchmark where no other model exceeds 2%.
However, despite OpenAI's AGI claims and o3's blockbuster scores, ARC-AGI co-creator François Chollet detailed in a blog post how the model still fails on "very easy tasks" and that higher tuning settings costed 172x as much as the lowest setting, which in o3's case is a jump from tens of dollars to thousands - impractical numbers for real-world applications.
As for the name, Sam Altman claimed during the livestream that o2 was avoided to steer clear of potential conflicts with British telecom company O2. The model is not yet publicly available, but a private beta for safety researchers to test o3-mini has already been opened up, with plans to launch at least the smaller model towards the end of January.
|
However, it's not all sunshine and rainbows for OpenAI, as a report by The Wall Street Journal found that OpenAI's fabled Orion project, GPT-5, may be over budget, months behind schedule, and facing technical challenges. For the sake of training data, the report outlines that OpenAI has been exhausting all avenues, from creating synthetic data with o1 to licensing deals to hiring developers to write code for the sake of fresh data. Though GPT-5 is expected to be a major upgrade, despite being 18 months in development and going through two large training sessions, the AI model allegedly isn't currently performing well enough to justify its exorbitant costs.
In other news regarding Musk's xAI, the fledgling AI startup has managed to pool another $6 billion in Series C funding, as per a recent filing and its own blog post, including participation from Blackrock, a16z, Sequoia, and more. Now sitting at a $40 billion valuation with a combined $12 billion raised, the funds will go towards supporting further Grok AI model development and the expansion of its Memphis-based supercomputer, "Colossus," which uses 100,000 Nvidia GPUs and will eventually double in size.
In a recent Sunday post on Truth Social, US President-elect Donald Trump has recently appointed Indian American entrepreneur Sriram Krishnan as his administration's "Senior White House Policy Advisor on Artificial Intelligence." Krishnan, with experience at Microsoft, Twitter, and Facebook, will shape AI policy and work with the President's Council of Advisors on Science and Technology. He will be working in collaboration with David O. Sacks, the newly oddly named "White House AI & Crypto Czar," to drive the US in AI innovation.
Elon Musk's AI company, xAI, has been testing a standalone iOS app for its Grok chatbot to bring its GenAI features like text rewriting, Q&A, summarization, and image generation to a native app. With access to real-time data from the web and the X social media platform, xAI also plans to launch a dedicated Grok.com website soon. Though it was previously exclusive to X Premium subscribers, Grok was made available to a broader audience with a platform-wide free version released earlier in December.
Machine learning (ML) observability platform Aporia recently announced that it had been acquired by fellow observability startup Coralogix for an undisclosed amount. Aporia specializes in monitoring ML models for issues like data drift, bias, and performance degradation while providing guardrails to protect against risks like hallucinations and data leaks. The acquisition will help Coralogix integrate Aporia's AI-powered observability approach with its own more traditional style of software monitoring. Additionally, as part of the deal, Coralogix will be launching a dedicated AI research center to focus on solving more of AI's fundamental problems.
|
NotebookLM - NotebookLM is an AI-powered research assistant built on Gemini 2.0's multimodality, designed to help users summarize and connect information from multiple sources, ranging from PDFs, websites, videos, to audio files.
Steer - Made for professional communication, Steer helps its users improve their speech and wording in any application with its quick and concise AI assistant, from touching up email drafts to tackling regular communication.
Afforai - Afforai is the AI-integrated reference manager that helps researchers manage, annotate, cite papers, and conduct literature reviews with AI reliably.
Parsio - Automate structured data extraction with Parsio, the AI-powered document parser for your PDFs, emails, and other documents - automatically.
|
o3 is a Statement on Reasoning Models & AI's Future (22-min watch)
AI Explained, a popular YouTube channel covering the latest in tech and AI, recently took a deep dive into the nitty-gritty of OpenAI's o3, aiming to cover all the highlights, hidden costs, broken benchmarks, and what comes next.
|
What did you think of this newsletter? Let us know! |
|
|
|
|
|