Balanced AI Insights
Posts
AI News of the Month | June 2025

AI News of the Month | June 2025

Agents, Regulation, and What’s (Not) Working Yet

Anna Tiomina
July 01, 2025

This month’s newsletter is coming to you live from an airport lounge. I’m heading off for a well-deserved workation in Lisbon, where I’ll be based for the next month. If you’re also in Lisbon, shoot me a message on LinkedIn or email (by replying to this newsletter) .

And don’t be surprised if I skip an issue or two. Even newsletter authors need rest, and I might just sneak in a week of proper vacation.

But today, let’s break down what actually happened in June. Spoiler: it’s a mixed bag—some promising signals, some serious limitations.

1. AI Threatens to Reshape Wall Street’s Junior Roles

A MarketWatch report detailed how AI is already transforming roles in trading, M&A, and equity research. Tasks like pitch deck assembly, due diligence, and earnings modeling are being delegated to AI—putting entry-level analysts and associates at risk.

Some banks are shifting hiring models: hiring fewer juniors while investing in AI copilots. Others see this as a way to rebalance teams toward more strategic roles.

So What:
Wall Street is often a leading indicator of workplace automation. If junior roles in banking are being redefined by AI, finance teams in other sectors won't be far behind.

What to Do:
Start by identifying low-leverage, repetitive work in your finance org. Pilot tools that eliminate these tasks—and reallocate your team’s time to analysis, not assembly.

2. U.S. Senate Moves to Block State-Level AI Regulation Patchwork

A bipartisan Senate bill aims to centralize AI governance at the federal level, preventing states from passing their own overlapping (or conflicting) rules. The bill cleared an important committee hurdle in June, with support from both tech advocates and business lobbies.

As a CFO working across multiple states or international markets, conflicting AI rules are already a headache. But total deregulation isn’t a solution either.

So What:
If this bill passes, it could simplify how finance leaders approach AI compliance. But without any federal regulation, you're flying blind.

What to Do:
Push your legal or risk teams to prepare for a middle-ground scenario: some light federal rules, but still plenty of uncertainty. Update internal AI usage guidelines proactively, not reactively.

3. Anthropic Unveils Project VEND: AI Agents That Handle Real-Time Business Workflows

Anthropic launched an experimental system called Project VEND, where Claude-based agents attempted to run a small virtual business. Agents specialized in tasks like document processing, invoice review, and email communication, all coordinated by a "manager agent."

What’s fascinating: There was no fine-tuning or special training. These were out-of-the-box Claude models connected through simple prompts and workflows.

Initially, the team of agents was able to launch the business and even secure some early wins, completing onboarding and transactional tasks. But over time, the business failed. The agents began to misunderstand objectives, made inconsistent decisions, and eventually burned out the budget.

So What:
Project VEND shows both the promise and fragility of agent-based automation. Coordination and execution can happen, but autonomy is still limited.

4. New Benchmark Shows AI Agents Can Handle Just 30% of Real-World Work Tasks

Meta released a benchmark called TheAgentCompany, simulating a digital workplace with tasks in software, admin, and finance. They tested top AI agents using models like GPT-4o, Gemini 2.5, and Claude 3.5.

Even the best agents (Gemini 2.5 Pro) could only fully complete 30% of tasks, and struggled most with financial and admin work. Agents performed better at coding, worse at communication, file handling, and spreadsheet tasks.

I’ve seen a lot of hype around agents “replacing jobs.” This paper is the first to back things up with a rigorous test—and the results confirm what I’ve been seeing: most finance-related agent deployments are still in proof-of-concept mode.

So What:
This is the most grounded look we’ve seen at what agents can actually do. A 30% success rate—even under ideal conditions—is a reality check.

LLM Ecosystem: June 2025 Updates

ChatGPT Updates

GPT-4.1 Replaces GPT-4.5: OpenAI silently transitioned from GPT-4.5 to a refined version, GPT-4.1, offering better performance, faster response time, and reduced hallucinations.
Custom Workspaces for Teams: ChatGPT Business users gained access to shared spaces, allowing teams to collaborate on prompts, templates, and outputs—especially useful for finance and operations teams using AI in workflows.
App & API Connectors (Beta): Integration with Excel, Notion, Slack, and Google Workspace tools entered beta, allowing users to interact with live data via ChatGPT.
Memory Improvements: More stable and longer contextual memory for returning users, helping with personalized responses and more consistent outputs across conversations.

Claude Updates

Claude 4 launch: Includes flagship Opus 4 and Sonnet 4—models enhanced for complex tasks and coding, with “thinking summaries” and extended context
Prompt caching rolled out, cutting costs by up to 90%

Google Gemini Updates

Gemini 2.5 Pro & Flash go live in June with:
Deep Think mode for enhanced reasoning on complex math/coding problems
Native audio output, emotional dialogue, multilingual capabilities, and advanced security
Native scheduling in Gemini apps

Closing Thoughts

June offered a sobering but valuable look at where AI truly stands in finance. The agent hype is loud, but real-world execution is still early-stage. Regulation remains a moving target. And even the most advanced models struggle with the messiness of everyday business tasks.

But that doesn’t mean we sit back and wait. The CFOs who’ll win in this next phase aren’t the ones chasing every shiny tool—they’re the ones building systems that are resilient, testable, and human-aware.

I’ll be watching the space closely (yes, even from Lisbon), and I’ll keep sharing what’s real, what’s working, and what’s worth your time.

We Want Your Feedback!

This newsletter is for you, and we want to make it as valuable as possible. Please reply to this email with your questions, comments, or topics you'd like to see covered in future issues. Your input shapes our content!

Want to dive deeper into balanced AI adoption for your finance team? Or do you want to hire an AI-powered CFO? Book a consultation!

Did you find this newsletter helpful? Forward it to a colleague who might benefit!

Until next Tuesday, keep balancing!

Anna Tiomina
AI-Powered CFO

Reply

or to participate.