99.9% of people missed this!
The biggest announcement in AI is happening in less than 30 days.
✨ Happy Friday! This is Ryan Staley of Whale Boss where I share the latest weekly insights, prompts, and workflows to unleash the power of AI! 🔥
Here’s what we got for you:
🤖99.9% of people missed the biggest announcement in AI that is happening in less than 30 days.
💎Generative AI funding reached new heights in 2024
❇️ OpenAI is turning its attention to ‘superintelligence’
😮 Google is forming a new team to build AI that can simulate the physical world
🤖 Google’s Daily Listen AI feature generates a podcast based on your Discover feed
😎 Microsoft rolls back its Bing Image Creator model after users complain of degraded quality
🤖 99.9% of people missed the biggest announcement in AI that is happening in less than 30 days
A new model will be released on the end of January
Less than 4 weeks after its PHD level predecessor
It’s tested almost as high as the Head of Research and at Openai at Programming.
Openai's 4o1 (PHD level reasoning) was fully released in the beginning of December
o3 will be released at the end of January, per Sam Altman on Shipmas.
Here's what the next model jump means:
1. Example Benchmarks
🔹Competition-Level Math
- o1 Accuracy (Hypothetical): ~75–80%
- o3 Accuracy (Hypothetical): ~95–97%
- Differential: 15–20+ percentage points
🔹Advanced Reasoning / Complex Problem-Solving
- o1 might handle basic multi-step reasoning but may drop to 70–80% accuracy on more intricate or domain-specific problems.
- o3 is more likely to be 90%+ accurate in the same tasks.
🔹Domain-Specific QA (e.g., specialized medical or legal questions)
- o1 can be directionally helpful but might miss nuanced details.
- o3 demonstrates clearer, more detailed explanations with fewer factual or logical errors—often a double-digit percentage improvement in correctness.
2. Why the Gap Varies
🔹Type of Task
- Simple tasks (basic Q&A) may see a smaller gap (e.g., 5–10 points).
- Complex tasks (competition math, specialized legal logic) show a larger gap (10–20+ points).
🔹Data Quality & Prompt Design
- The better you structure your prompts and feed relevant context, the more performance improves—especially for “o3,” which can capitalize on nuanced cues. 3. Model “Headroom”
- “o1” sometimes hits a plateau on advanced logic or large data correlations, whereas “o3” has more overhead to process deeper patterns.
3. Interpreting “Expected % Outcome Differential”
🔹Accuracy/Success Rate: This could mean how often the model’s answers or suggestions align with correct real-world outcomes.
🔹Completion/Throughput: In some workflows (like summarizing or coding tasks), “o3” may handle 10–25% more complex queries before requiring human review.
🔹Error Reduction: In high-stakes tasks (financial forecasting, medical Q&A), reducing error rates by even 5–10% can have an outsized business or clinical impact.
😮 Generative AI funding reached new heights in 2024
Generative AI Investment Highlights (2024)
Record Funding: $56B raised (+192% from 2023) across 885 deals.
Major Rounds: Databricks ($10B), xAI ($6B), Anthropic ($4B), OpenAI ($6.6B).
M&A Activity: $951M in deals, excluding major acqui-hires by Google ($2.7B) and Microsoft ($650M).
Global Standouts: Moonshot AI (China, $1B), Mistral (France, $640M), DeepL (Germany, $300M).
2025 Challenges: Risk of oversaturation, high computing costs, and investor pressure on revenue growth.
Infrastructure Wins: Data centers like Crusoe ($600M) and Lambda ($320M) thrive as AI infrastructure spending nears $250B/year. Read it full here.
🚀OpenAI is turning its attention to ‘superintelligence’
OpenAI believes it knows how to build AGI and is shifting its focus to superintelligence, which Altman predicts could arrive within a few thousand days.
Potential Impact: Superintelligent AI could revolutionize science and innovation, boosting prosperity and transforming the workforce.
Challenges: Current AI limitations include errors, hallucinations, and high costs. OpenAI acknowledges unresolved issues in controlling superintelligent AI.
Safety Concerns: Despite risks, OpenAI has disbanded safety teams and seen key researchers depart over commercial ambitions.
Corporate Restructuring: OpenAI is restructuring to attract investors, with Altman defending its safety record amidst criticism.
Altman remains optimistic but stresses the need for care in developing superintelligent systems. Read More.
😮Google is forming a new team to build AI that can
simulate the physical world
Led by Tim Brooks, the team at Google DeepMind will develop AI models to simulate the physical world, building on projects like Gemini, Veo, and Genie.
Focus: Real-time interactive AI for applications in gaming, simulation, and robotics.
Concerns: Potential job disruption in creative industries and copyright risks with training data.
Google’s approach to collaboration and data usage remains under scrutiny.
Read it full here.
🤖Google’s Daily Listen AI feature generates a podcast based on your Discover feed
Image: Google
"Daily Listen" is a personalized, AI-powered podcast that summarizes topics from your Discover feed in a 5-minute audio format.
Availability: Rolling out on the Google app for Android and iOS in the U.S. for users in the Search Labs experiment.
Features:
Provides audio overviews of stories based on your interests.
Includes links to related stories for deeper exploration.
Offers a written transcript alongside the podcast.
Personalization: Topics are curated from your Google Search and Discover activity for a tailored audio experience.
This quick, AI-driven feature helps users stay updated on their favorite topics efficiently.
😮Microsoft rolls back its Bing Image Creator model after users complain of degraded quality
Image: Bing
Microsoft upgraded Bing Image Creator with OpenAI’s DALL-E 3 PR16 model, promising faster and higher-quality image generation. However, users reported poor results, describing the images as less realistic, cartoonish, and lacking detail.
Complaints on X and Reddit led Microsoft to announce a rollback to the previous PR13 model while addressing the issues.
Despite internal benchmarks showing slight improvements, user preferences highlighted a disconnect between metrics and real-world expectations.
The incident underscores the difficulty in aligning AI upgrades with public reception.
Ready to transform your workflows and scale with AI? Don’t miss out—explore the
game-changing updates and strategies in this newsletter.
Let’s build a smarter tomorrow together! ✨
What did you think of today's newsletter?Your feedback helps us create the best newsletter possible. |