Treasury management, FX management tool, FX management tool for startups, cash visibility tool for start up

Claude Opus 4.7 is a genuine step change for finance teams: 4 ways to use it (and 3 to hold off on)

Ben Buckingham
Ben BuckinghamCEO

Primary has started trialling its MCP product with customers over the past few weeks.

Quick context: Model Context Protocol (MCP) is an open standard that lets AI applications connect to external data sources, tools, and systems in real time. Primary's MCP makes treasury data including balances, transactions, FX, yield of other data sets directly accessible to LLMs like Claude.

Initial feedback from CFOs has been that Claude is a genuine unlock when used as a thought partner. Finance teams are using LLMs to stress test assumptions, challenge their own logic, and surface optimisations and querying in natural language is becoming a default part of their workflow.

Last week, Anthropic released Claude Opus 4.7 and the update matters for the finance function.

Until now, Claude has been excellent at engaging with the logic you provide, but limited when that logic is incomplete. Earlier models (including Opus 4.6) tend to interpret: filling in assumptions when instructions are vague, smoothing over data gaps, and rarely flagging contradictions across datasets. For a finance team, this is a real ceiling when precision and accuracy are absolutely necessary.

Opus 4.7 shifts the dynamic. It behaves more like an autonomous agent, operating earlier in the workflow, across more complex reconciliation and processing tasks, rather than only assessing final numbers after the manual work is done.

What has actually changed between 4.6 and 4.7?

Opus 4.7 is an agentic reliability release, meaning that the update wasn’t focused on making the model smarter, but focused on making it more dependable when acting as an agent. In simple terms, executing multi-step tasks autonomously rather than just answering a single question.

For finance teams, a few areas stand out.

One: Multi-step financial analysis is meaningfully more accurate

On Anthropic's Finance Agent benchmark, Opus 4.7 scored 64.4% versus 59.7% for other LLMs. On GDPval-AA, a third party artificial analysis evaluation framework across 44 occupations and 9 major industries, it achieved the highest ELO of any generally available model. ELO scores test for agentic performance on real world work tasks.

In practice, this means a finance team can upload a management pack, an operating plan, and a set of actuals, and the model will hold and retain accurate information in all three rather than losing accuracy as the task gets longer. The previous ceiling where the model would start to drift or transpose numbers partway through a complex task has lifted meaningfully.

Two: It no longer fabricates when data is missing

Hex, an AI analytics platform, found that Opus 4.7 correctly reports absent data instead of generating plausible alternatives, and resists ‘dissonant data traps’ that Opus 4.6 fell for.

This is one of the largest unlocks for using the model across financial workflows. Previous versions would fill gaps with assumptions drawn from prior context or training data, which meant a CFO could receive analysis that looked complete but was built on inputs the model had fabricated to make the output coherent. Opus 4.7 is far more likely to query contradictions in data or ask for missing inputs, rather than relying on previous memory to fill the blanks.

Three: It follows instructions more literally

Previous versions of Claude would loosely interpret what you asked for. If an instruction or step was missed, Claude would infer. The improvisation causes challenges when not enough detail is provided via prompts.

Opus 4.7 does exactly what you tell it, and stops there. The obvious upside is precision but the downfall is that vague prompts produce vague output, because the model will no longer fill in gaps.

This means the way you prompt is going to matter more. The practical shift is toward being explicit about scope, format, and constraints up front. E.g. Instead of ‘analyse this data’, you need to specify what you want analysed, against what baseline, in what format, and what to flag if something does not reconcile. Show the model what good output looks like with an example rather than describing it in the abstract.

Four: It can read your documents properly

The model now processes images at roughly three times the resolution of prior versions. For finance teams working with scanned invoices, complex charts, or dense spreadsheet screenshots, this is a practical unlock that removes a barrier which previously made the model unreliable for document-heavy workflows.

Four practical ways finance teams can use Opus 4.7

Board pack, investor update or cycle review integrity checks

Upload a CEO letter, a P&L, and an operating plan, and ask whether the narrative holds up against the numbers. Opus 4.7 can hold multiple documents in working memory and surface where they contradict each other. This is typically a highly time consuming back end task for finance teams when doing final checks and reviews of important documentation.

Month end close acceleration

Reconciling data across systems, checking completeness, and flagging anomalies is precisely where Opus 4.7's improvements in long context consistency will have most impact. Finance teams running AI-connected platforms with real time transaction data can accelerate this workflow with a lower risk of silent errors.

Variance commentary and flux analysis

Utilise Opus 4.7 in comparing actuals versus budget and prompt the tool for structured commentary that holds together across line items. The literal instruction following means it produces exactly the format and level of detail you specify, rather than drifting into generic observations.

Scenario modelling support

Feed the model your assumptions and ask it to stress test across multiple scenarios. Whether that is FX sensitivity, customer churn, or cost escalation, it’s improved structured reasoning can run through a range of outcomes and present results in a format that is board ready rather than requiring manual cleanup.

Three areas where Opus 4.7 may not be the right fit yet

Deep web research and market intelligence

On the Agentic search BrowseComp benchmark, which measures an AI's ability to search and synthesise across multiple web pages, Opus 4.7 actually regressed from both its previous model and its competitors. If your workflow involves pulling together competitive intelligence or synthesising external market data, this model isn’t considered the best in class tool.

Currently GPT-5.4 ranks highest, and other platforms including Gemini are considered cheaper to produce this output.

AP invoicing and management

Opus 4.7's improved image processing is a technical step forward, but for AP and expense workflows, dedicated AP platforms are built around full integration across existing systems and have in built audit trails in a connected system. Claude reads invoices more accurately than before but as a standalone tool, it reintroduces the manual handovers that purpose-built AP tools remove.

Iterative back-and-forth refinement

Many finance teams have gotten used to working with Claude conversationally, refining an output over five or six messages. On Opus 4.7, this workflow is actually less effective. Each clarifying turn adds reasoning overhead on top of the literal interpretations from earlier turns, which means the model can start compounding misinterpretations rather than converging on what you want. If your team needs to run more conversational workflows, then previous models may be better suited.

Primary’s thoughts on Opus 4.7

Claude Opus 4.7 has meaningful productivity advantages. Agentic features are becoming commonplace for finance teams and trialling tools as updates flow through will be imperative for growth companies to remain relevant.

From our early work integrating Claude with Primary's MCP, the tool performs like a highly capable analyst, accelerating the work finance teams are already doing. But humans need to remain at every touchpoint, querying outputs, applying judgement, and making the final call on compliance, audit, and analysis.

About Primary

Primary provides modern treasury management solutions for complete cash visibility, idle cash optimisation, and FX risk management - all in one platform.

Ready to get started?

See how Primary can help your growing business achieve enterprise-grade cash management with startup efficiency.

More blogs