| Grok Coming in Hot | By Jeff Brown, Editor, The Bleeding Edge | | The reality is that we, the human race, are no longer at the top. | And it’s a matter of our own doing. | The confluence of raw computational power – using parallel processing and advancements in software programming – has resulted in the realization that artificial general intelligence (AGI) is in reach. | Not in a matter of years, but in a period that will be measured in months. | And whether or not the financiers that controlled the purse strings knew what they were investing in or not, it doesn’t matter… | The capital influx to chase the objective – of achieving AGI – now represents a flood of investment capital that simply will not stop. | The wheel has been pushed up and over the top of the mountain. | It is now accelerating down the other side. | It’s Getting Heavy | Several days ago, xAI released Grok 4, its most advanced, pre-AGI model. | I needed to sleep on it for several days, to experiment with Grok, and to think before I wrote about it. It’s simply not accurate to call Grok 4 a large language model (LLM). Doing so would be a sign of ignorance or bias. | Ignore whatever criticisms you might read about it. Politics weigh heavily, and they completely miss the point. | Something extraordinary has just happened. A shocking leap ahead by the team at xAI. And it’s a vindication of the team’s approach to both hardware and software architecture, as well as their foundational approach to building a maximum truth-seeking AI. | Grok 4 and its more powerful counterpart, Grok 4 Heavy, were both trained on xAI’s Colossus supercomputer, specifically 200,000 GPUs. What’s different from Grok 3? | Designing new software architecture and algorithms, Grok 4 has 6X the improved compute efficiency in training than Grok 3.xAI embarked on a “massive data collection effort” to increase the scale of its high-quality training data.xAI expanded its verifiable training data to many more domains of knowledge beyond math and programming.More than an order of magnitude of compute was used to train Grok 4 compared to Grok 3. | The result? | Nothing short of superhuman reasoning and intelligence. | Measuring General Intelligence | Longtime Bleeding Edge readers know that to measure machine intelligence, we look to the benchmarks. We’ve been tracking the rollouts of models for years now, as well as how they perform on a suite of “tests.” | When it comes to the most difficult benchmarks for AGI, no other company is even close to xAI right now. | Humanity’s Last Exam, which is designed to be extremely difficult, even for human experts, is comprised of 2,500 questions across all academic disciplines. The questions require deep reasoning to solve. They’re not the kinds of questions that simply require knowledge retrieval, which would be too easy. | | Grok 4 Performance on Humanity’s Last Exam | Source: xAI | Grok 4 Heavy scored a 44.4 compared to the previously highest score of 26.9 by Google’s Gemini Deep Research. That is a huge jump. Better yet, xAI was able to demonstrate – as shown above on the right – that with additional training, Grok 4’s performance improves, in this case just over 50%. | Not satisfied? | Here are the results of the ARC-AGI 2 benchmark, another prominent measurement of an AI’s ability to reason to solve problems. Pure large language models (LLMs) score 0% on this test. | Grok 4, shown below clearly out front on top, scored a remarkable 15.9%. This score is almost double the previous state of the art model. Double. | | ARC-AGI 2 Leaderboard | Source: ARC Prize | The ARC-AGI 2 benchmark was just recently introduced this May. It was already clear that existing AI models were making significant progress against the ARC-AGI 1 benchmark, indicating that it simply wasn’t difficult enough. | OpenAI’s o3 had scored 60.8% earlier this year, and Grok 4 came in at 66.7% on ARC-AGI 1. | | ARC-AGI Leaderboard Breakdown | Source: ARC Prize | ARC-AGI 2 is monumental. The fact that Grok 4 achieved a 15.9% (16%), double the previous high, is nothing short of remarkable. ARC-AGI 2 is designed to test higher levels of fluid intelligence, abstract reasoning, and sophisticated generalization. | Grok 4, when given time to think, is now more intelligent than pretty much any PhD-level human in any domain of study. Period. | To some, that thought may feel threatening. To others, it may feel empowering. And to those who haven’t experimented with Grok 4, I can’t encourage you enough to start using it. It is so helpful and resourceful, it is hard to describe. | And there’s more… | Grok 4 is an agentic AI. | Recommended Links After a shocking discovery in D.C., Jeff Brown is putting out an urgent broadcast tonight, July 16, at 8 p.m. ET: President Trump’s “Project MAFA” Get a look inside Trump’s genius masterplan to create a “new” gold standard, smash the U.S. debt, and kick off a Golden Century before July 25. Register instantly here. (When you click the link, your email address will automatically be added to Jeff's guest list.) The TRUTH About Trump and Musk? If you think there’s something strange about the “feud” between Trump and Musk… You need to see THIS jaw-dropping video … Because it explains what could REALLY be going on behind the scenes… And how it could hand investors a stake in a $12 trillion revolution. Click here now to see the full story. | | Agentic Grok | For those just joining us, I want to personally welcome you to The Bleeding Edge, the best place on the planet for gleaning unique insights and intel from the outer limits of high-tech development. | As a friendly reminder to those who have been following along for years, agentic AI is a trend we began following in early 2024. Here’s a bit of what I wrote… | | Agentic AI, or agentic reasoning, is kind of like it sounds. | The technology, the AI, is given agency. It is given the authority or directive to solve a problem or complete a task through a series of steps. | This differs from today’s LLM technology, which provides users with a zero-shot response. When we use something like ChatGPT, we give it a prompt, and then it returns us a complete response. The response is based on the information from our prompt, along with its pre-trained knowledge, and returned in a matter of seconds. | An agentic workflow is quite different. It is an iterative process, where an agentic AI uses a more human-like workflow to accomplish a task. |
| | In Grok’s case, it has been trained on how to use various software tools to get its job done. These are things like software tools for programming, or tools used for browsing the internet for real-time information. | And Grok 4 Heavy is a more powerful version of Grok 4 that takes advantage of test-time compute. An easy way to think about that is how Grok 4 Heavy can create 5, 10, or any number of hypotheses and test them all, in parallel, at the same time. | This approach naturally requires more computational horsepower, which means more electricity (cost). But it also means that complex problems can be solved, or an algorithm can be optimized in a fraction of the time. | And here’s the key… | The productivity improvements are going to come faster than we can imagine. | And Grok 4 comes with a new and improved voice interface, which is ridiculously easy to speak with. It’s in the palm of your hand with a smartphone now… but it’s also on our tablets, desktops, and if you have a Tesla, it’s being rolled out and integrated in all the modern Tesla electric vehicle models, a topic we explore in Monday’s Bleeding Edge – The Omnipresent Grok. | | Source: The Bleeding Edge | But the bigger question is: What’s next for Grok? | Catapulting Ahead | In the next few weeks, we’re going to see major improvements in Grok’s multi-modal capabilities. | Specifically, it will receive a major upgrade in how it sees, hears, and understands the real world through audio and video inputs. | That might not sound like a big deal, but it is. | As I mentioned on Monday, Grok will have access to the cameras and audio inputs on Tesla EVs. We’ll be able to speak with Grok about what we’re seeing outside the car. And when we hold up our phone, Grok can see and hear exactly what we do, and it will understand our environment in the way that we understand our environment. | And the even more obvious application is to put Grok in Tesla’s humanoid robot Optimus, so that it can better understand its surroundings, and also better interact with us humans. | xAI is catapulting ahead. It has a competitive advantage. And this is resulting in a hyper-acceleration of technological advancement. | What’s next? Take your best guess at how far xAI and Grok will advance by the end of the year. What it accomplished in the last six months was something no other company was capable of doing. | One thing I’m sure of is that the next six months will be even more astonishing… | And with each step, we are that much closer to AGI. | Every day, we accelerate… | Can you grok it, now? | Jeff | | | Digital Dry Powder | There’s a catalyst that’s about to spark a major surge – sending crypto to new highs and beyond. |
| | The Omnipresent Grok | Elon Musk just announced that xAI’s frontier AI model, Grok, is coming soon to Tesla vehicles… |
| | | Like what you’re reading? Send your thoughts to [email protected]. | | | Brownstone Research 1125 N Charles St, Baltimore, MD 21201 www.brownstoneresearch.com | | To ensure our emails continue reaching your inbox, please add our email address to your address book. This editorial email containing advertisements was sent to [email protected] because you subscribed to this service. To stop receiving these emails, click here. Brownstone Research welcomes your feedback and questions. But please note: The law prohibits us from giving personalized advice. To contact Customer Service, call toll free Domestic/International: 1-888-512-0726, Mon-Fri, 9am-7pm ET, or email us here. © 2025 Brownstone Research. All rights reserved. Any reproduction, copying, or redistribution of our content, in whole or in part, is prohibited without written permission from Brownstone Research. | Privacy Policy | Terms of Use | | |