Welcome to Eye on AI. This installment features…AI surpassing certain experts…Google intends to introduce advertisements to Gemini…prominent AI research institutions collaborate on AI agent guidelines…a fresh initiative to equip AI models with enhanced recall…and sentiment shifts regarding LLMs.And AGI.
TL;DR
- AI applications outperformed human lawyers in legal research tasks, with ChatGPT showing high precision.
- AI-generated advertisements were perceived as more impactful, but disclosure of AI origin reduced purchase intent.
- Experienced professionals benefit most from AI tools by crafting optimal prompts and assessing outputs.
- Google plans to introduce advertisements to its Gemini chatbot in 2026, distinct from Search AI Mode ads.
Hello from San Francisco, as we conclude Coins2Day Brainstorm AI. A summary of key takeaways from the event will be presented on Thursday. However, today's focus will be on significant research released recently that could profoundly affect the business consequences of AI.
First, there was a study from the AI evaluations company Vals AI that pitted several legal AI applications as well as ChatGPT against human lawyers on legal research tasks. All of the AI applications beat the average human lawyers (who were allowed to use digital legal search tools) in drafting legal research reports across three criteria: accuracy, authoritativeness, and appropriateness. The lawyers’ aggregate median score was 69%, while ChatGPT scored 74%, Midpage 76%, Alexi 77%, and Counsel Stack, which had the highest overall score, 78%.
A particularly interesting discovery is that for a variety of query categories, it was the broad-purpose ChatGPT that demonstrated the highest precision, surpassing more focused programs. Furthermore, despite receiving lower scores for credibility and suitability, ChatGPT still outperformed human legal professionals in those specific areas.
Critics have pointed out the study's shortcomings, including its failure to evaluate prominent and frequently utilized legal AI research platforms like Harvey, Legora, CoCounsel by Thompson Reuters, or LexisNexis Protégé. Furthermore, it exclusively examined ChatGPT from the leading general-purpose models. Nevertheless, the results are significant and align with informal accounts I've received from legal professionals.
A little while ago I had a conversation with Chris Kercher, a litigator at Quinn Emanuel who founded that firm’s data and analytics group. Quinn Emanuel has been using Anthropic’s general purpose AI model Claude for a lot of tasks. (This was before Anthropic’s latest model, Claude Opus 4.5, debuted.) “Claude Opus 3 writes better than most of my associates,” Kercher told me. “It just does. It is clear and organized. It’s a great model.” He said he is “constantly amazed” by what LLMs can do, finding new issues, strategies, and tactics that he can use to argue cases.
Kercher said that AI models have allowed Quinn Emanuel to “invert” its prior work processes. In the past, junior lawyers—who are known as associates—used to spend days researching and writing up legal memos, finding citations for every sentence, before presenting those memos to more senior lawyers who would incorporate some of that material into briefs or arguments that would actually be presented in court. Today, he says, AI is used to generate drafts that Kercher said are by and large better, in a fraction of the time, and then these drafts are given to associates to vet. The associates are still responsible for the accuracy of the memos and citations—just as they always were—but now they are fact-checking the AI and editing what it produces, not performing the initial research and drafting, he said.
He stated that the most seasoned, veteran attorneys frequently derive the greatest benefit from collaborating with AI, as they possess the knowledge to formulate an optimal prompt, coupled with the professional insight and ability to discern the AI's output's merit swiftly. Does the reasoning the system has generated hold up? Is it probable to succeed before a specific magistrate or persuade a panel of jurors? Such inquiries still necessitate judgment honed by experience, Kercher remarked.
Ok, so that’s law, but it likely points to ways in which AI is beginning to upend work within other “knowledge industries” too. Here at Brainstorm AI yesterday, I interviewed Michael Truell, the cofounder and CEO of hot AI coding tool Cursor. He noted that in a University of Chicago study looking at the effects of developers using Cursor, it was often the most experienced software engineers who saw the most benefit from using Cursor, perhaps for some of the same reasons Kercher says experienced lawyers get the most out of Claude—they have the professional experience to craft the best prompts and the judgment to better assess the tools’ outputs.
A recent study emerged concerning the application of generative AI for producing imagery in advertising. Academics from New York University and Emory University's business departments investigated whether advertisements for beauty items, produced solely by human professionals, by human professionals and subsequently refined by AI systems, or generated entirely by AI systems, garnered the most favor among potential shoppers. Their findings indicated that the advertisements produced exclusively by AI were perceived as the most impactful, leading to a 19% boost in clickthrough rates during an online experiment. Conversely, those developed by humans and then modified by AI proved less successful than those created by human professionals without any AI involvement. Crucially, however, when individuals were informed that the advertisements were AI-generated, their propensity to purchase the product diminished by nearly one-third.
These discoveries pose a significant ethical dilemma for companies. The majority of AI ethics experts believe individuals ought to be informed when they're encountering content produced by artificial intelligence. Furthermore, advertisers must navigate several Federal Trade Commission regulations concerning “truth in advertising.” Nevertheless, numerous advertisements already feature performers acting in diverse capacities without a requirement to explicitly state they are actors, or they do so only in extremely small print. How does advertising created by AI differ? The research suggests a future where an increasing volume of advertising will be AI-generated, with limited disclosures.
The research also appears to question the established belief that “centaur” approaches (which integrate human and AI capabilities synergistically) will consistently outperform either humans or AI independently. (This notion is occasionally summarized by the saying “AI won’t take your job. A human using AI will take your job.”) An increasing amount of investigation indicates that in numerous domains, this is not the case. Frequently, the AI operating by itself actually yields superior outcomes.
However, it's also true that the effectiveness of centaur solutions is highly contingent on the precise configuration of human-AI collaboration. For instance, a study involving human physicians utilizing ChatGPT for diagnostic assistance revealed that individuals collaborating with AI could achieve superior diagnoses compared to either physicians or ChatGPT independently. This improvement was contingent on ChatGPT producing an initial diagnosis, followed by human physicians offering a second opinion after reviewing the AI's assessment. Conversely, when this sequence was inverted, with ChatGPT providing a second opinion on a physician's diagnosis, outcomes deteriorated. In fact, the second-best performance was achieved simply by Having ChatGPT generate the diagnosis. Within the advertising research, it would have been beneficial for the investigators to examine the scenario where AI creates advertisements and human specialists subsequently revise them.
However, the drive for automation, frequently without direct human oversight, is gaining traction in numerous sectors.
On that happy note, here’s more AI news.
Jeremy Kahn
[email protected]
@jeremyakahn
FORTUNE ON AI
Exclusive: Glean hits $200 million ARR, up from $100 million 9 months back—by Allie Garfinkle
According to the CEO of the $29 billion startup Cursor, the company has established an internal AI help desk capable of addressing 80% of its workforce's support requests. —by Beatrice Nolan
HP's chief commercial officer foresees a future where PCs are equipped with AI capabilities and do not transmit data to the cloud. —by Nicholas Gordon
AI IN THE NEWS
Trump allows Nvidia to sell H200 GPUs to China, but China may limit adoption. President Trump indicated he would permit shipments of Nvidia’s advanced H200 chips to authorized Chinese clients. Nvidia's chief executive, Jensen Huang, has described China as a market generating $50 billion in yearly revenue for the firm, yet Beijing aims to reduce its firms' dependence on American-made semiconductors, and Chinese authorities are considering a licensing framework that would compel purchasers to explain why local chips are insufficient for their requirements. They might even prohibit state entities from acquiring H200s. However, Chinese firms frequently opt for Nvidia's chips and even develop their models abroad to circumvent American export restrictions. Trump's choice has provoked political opposition in Washington, with a cross-party contingent of senators attempting to halt these exports, although the bill's chances are not guaranteed. Discover further details from the Financial Timeshere.
Trump plans executive order on national AI standard, aimed at pre-empting state-level regulation. President Trump stated his intention to sign an executive order this week establishing a unified national benchmark for artificial intelligence, asserting that businesses struggle with a fragmented system of fifty distinct state regulatory frameworks, according to Politico. ReportedThis action comes after a leaked draft order from November that aimed to halt state AI legislation, sparking renewed discussion about whether national mandates should supersede state and municipal rules. An earlier effort to incorporate AI preemption clauses into the defense bill at the close of the year failed last week, leading the administration to revert to implementing the policy via executive directive.
Google plans to bring advertising to its Gemini chatbot in 2026. That’s according to a report in Adweek, referencing data from two anonymous Google advertising patrons. The report indicated that specifics regarding format, cost, and trials were still vague. It further stated that the novel advertising method for Gemini is distinct from advertisements slated to be displayed with “AI Mode” queries on Google Search.
Former Databricks AI head's new AI startup valued at $4.5 billion in seed round. Unconventional AI, an enterprise established by Naveen Rao, who previously led Databricks AI, secured $475 million during its initial funding phase. This investment was spearheaded by Andreessen Horowitz and Lightspeed Venture Partners, valuing the company at $4.5 billion. This significant achievement occurred merely two months following the company's inception, as reported by Bloomberg News. ReportedThe firm intends to construct a new, more power-conscious computational framework for supporting AI tasks.
Anthropic forms partnership with Accenture to target enterprise customers. Accenture and Anthropic have entered into a three-year collaboration, positioning Accenture as a significant enterprise client for Anthropic. This alliance intends to assist companies, a number of which still harbor doubts, in achieving concrete benefits from their AI expenditures. The Wall Street JournalreportedAccenture plans to equip 30,000 staff members with knowledge of Claude and, in collaboration with Anthropic, will establish a specialized unit focused on heavily regulated sectors, assigning engineers to work alongside clients to speed up implementation and quantify benefits.
OpenAI, Anthropic, Google, and Microsoft team up for new standard for agentic AI. The Linux Foundation is establishing a collective known as the Agentic Artificial Intelligence Foundation, drawing in prominent AI firms like OpenAI, Anthropic, Google, and Microsoft. Its objective is to develop common open-source guidelines that facilitate dependable communication between AI agents and business applications. This initiative will concentrate on standardizing crucial instruments, including the Model Context Protocol, OpenAI's Agents.md specification, and Block’s Goose agent, with the goal of promoting uniform interoperability, security measures, and contribution policies throughout the entire framework. Chief Information Officers are increasingly asserting that standardized protocols are vital for addressing security weaknesses and ensuring agents can operate effectively within actual corporate settings. Discover further details here from The Information.
EYE ON AI RESEARCH
Google has created a new architecture to give AI models longer-term memory. The architecture, called Titans—which Google first debuted at the start of 2025 and which Eye on AI covered at the time—is paired with a framework named MIRAS that is designed to give AI something closer to long-term memory. Instead of forgetting older details when its short memory window fills up, the system uses a separate memory module that continually updates itself. The system assesses how surprising any new piece of information is compared to what it has stored in its long-term memory, updating the memory module only when it encounters high surprise. In testing, Titans with MIRAS performed better than older models on tasks that require reasoning over long stretches of information, suggesting it could eventually help with things like analyzing complex documents, doing in-depth research, or learning continuously over time. You can read Google’s research blog here.
AI CALENDAR
Jan. 6: Coins2Day Brainstorm Tech CES Dinner. Apply to attend here.
Jan. 19-23: World Economic Forum, Davos, Switzerland.
Feb. 10-11: AI Action Summit, New Delhi, India.
BRAIN FOOD
At NeurIPS, the mood shifts against LLMs as a path to AGI. The Information reported a significant portion of academics present at NeurIPS, the premier gathering for AI research, held recently in San Diego (along with related events elsewhere), are expressing growing doubt about the notion that extensive language models (LLMs) will ultimately result in artificial general intelligence (AGI). Their sentiment is that the discipline might require a completely novel AI framework to progress towards AI that mimics human capabilities, enabling continuous learning, efficient learning from limited data, and the ability to infer and relate concepts to novel challenges.
Prominent figures like Amazon's David Luan and OpenAI's co-founder Ilya Sutskever assert that existing methodologies, encompassing extensive pre-training and reinforcement learning, are insufficient for generating models with genuine generalization capabilities. Meanwhile, novel investigations showcased at the conference delve into self-adjusting models capable of acquiring fresh information dynamically. This doubt stands in opposition to the perspective held by executives such as Anthropic CEO Dario Amodei and OpenAI's Sam Altman, who maintain that expanding current techniques can still lead to the realization of AGI. Should the skeptics prove right, it might jeopardize substantial financial commitments allocated to current training infrastructures.











