ChatGPT for salary benchmarking: How reliable is it?

Benchmarking

8 min read ・ 15 Jun 2026

Top
Testing AI-generated salary benchmarks vs Ravio’s HRIS-integrated benchmarks
4 reasons ChatGPT salary benchmarking is unreliable
AI tools for salary benchmarking vs real-time benchmarking platforms
Compensation benchmarking needs more than web-scraped salary averages
FAQs

AI tools are getting good enough that asking ChatGPT to benchmark a salary feels like a reasonable shortcut.

It's free, it's instant, and the output sounds authoritative.

But is it actually usable for compensation decisions – or does it create more problems than it solves?

The short answer: ChatGPT draws on broad, publicly available data that isn't validated, isn't mapped to your job architecture, and has no consistent update cycle.

For a sense-check, that might be fine.

For setting market competitive salaries to meet hiring and retention goals (especially for niche or emerging roles or locations), the gap between what it returns and what a specialist compensation benchmarking source returns is significant.

To show that gap concretely, we gave ChatGPT two benchmarking prompts and compared the output with benchmarks from Ravio. Here's what we found.

Testing AI-generated salary benchmarks vs Ravio’s HRIS-integrated benchmarks

Rather than going into detailed research analysis that’s rather hard to replicate, we gave ChatGPT two simple prompts – to give us salary data for a role and build a salary band for another role – and compared the results using data from Ravio.

What we found is rather interesting:

Prompt 1: How much should we pay a Senior Product Designer in Estonia?

The salary data ChatGPT returned:

€55,000-€75,000 base salary for most well-funded startups and scaleups
€70,000-€90,000+ for top-tier international or remote-first tech companies
Around €65,000-€75,000 as the current market sweet spot in Tallinn-based tech hiring.

See how ChatGPT sources from free salary data sources that’re typically unverified and come with little data transparency.

A few things worth noting about what ChatGPT returned:

The range spans €20,000, with no indication of what percentile the floor or ceiling represents – so you don't know whether you're looking at the 25th or the 90th – or what job level it’s mapped a ‘Senior’ role to.
The sourcing is opaque. ChatGPT pulls from publicly available free data – job boards, Glassdoor, salary aggregator sites, marketing-produced salary guides – which are usually simple averages drawn from user-reported salaries, with no disclosed methodology for how those figures are weighted, validated, or dated.
There's no peer group definition. "Well-funded startups and scaleups" and "top-tier international companies" are very vague groupings, with no indication of which companies are actually included, and whether they represent your talent competitors.
There's no job level mapping. "Senior Product Designer" is ambiguous. One company's P4 is another's P5, and the pay difference between those levels can be significant – which is why like-for-like comparison is the foundation of accurate market benchmarking. Most purpose-built benchmarking tools will help with that process (like Ravio mapping your job architecture to a standardised levelling framework), so that all figures are comparable across companies.
There's no target percentile context. A salary range without a percentile anchor isn't a benchmark – it's a spread. Paying at the 50th percentile and paying at the 75th are different strategies with different cost and talent implications, and any reliable benchmarking source will let you see the full picture. Ravio provides benchmarks at the 10th, 25th, 50th, 75th, and 90th percentiles as standard, with custom percentiles available in 5% increments. ChatGPT returns a range, but it isn't tied to a defined market position.

So how does this compare to Ravio's benchmarks for the same ask?

For this comparison, we've used Ravio's P4 Product Design benchmark – the most typical job level for a senior IC role – at the 50th percentile – the most typical pay positioning.

Ravio's benchmark for a P4 Product Designer in Estonia is €68,300 at the 50th percentile (median).

Ravio benchmarks_ Senior Product Designer in Estonia

If we compare this to the AI-sourced range, we can see that it's broadly in the right territory – ChatGPT said €55,000-€75,000, and Ravio's 50th percentile sits at €68,300.

But "in the territory" isn't the same as usable.

The AI-generated result looks plausible at a glance, but plausible isn’t the same as accurate – and unverified salary data means any decisions you use it for are hard to defend internally – to managers, leaders, employees, and for legal compliance.

And as the benchmarking task gets more specific, the gap between plausible and accurate continues to widen.

Prompt 2: Can you build me a set of salary ranges for Data Engineering in Germany?

Next, we asked ChatGPT to build salary bands for the Data Engineering role in Germany for all job levels:

It gave us this – again drawing from publicly available but unreliable free salary data sources such as a NexaTalent, a recruiting firm, and the Stepstone job search board:

The same verification problems apply as before – no disclosed methodology, no peer group definition, no percentile anchoring.

But there's an additional issue with salary bands specifically: ChatGPT returns ranges with no explanation of how they were constructed.

Salary bands aren't a direct output of benchmarking – they're a design decision.

The bands you build depend on your compensation philosophy: which percentile you set as the midpoint, how wide each band runs, and how you manage progression between levels.

But with the ChatGPT output there's no compensation philosophy behind them, no band width rationale, and no guidance on how to handle progression between levels.

If you adopted them internally, you'd have no defensible basis for why the bands are structured the way they are or how to use them for fair, consistent compensation decisions.

To compare this output to what you’d build using a reliable benchmarking source, we’ve again used Ravio’s HRIS-integrated benchmarks as our data source.

To build our bands we’ve used the 50th percentile as the midpoint and applied a 15% spread either side – a common setup, but by no means the only approach to building bands.

So using Ravio’s benchmarks above, your bands would look like this:

Job level	Band minimum	Band midpoint	Band maximum
P1	€49,500	€58,200	€66,900
P2	€57,500	€67,700	€77,900
P3	€67,200	€79,200	€91,100
P4	€78,800	€92,700	€106,600
P5	€92,800	€109,200	€125,600
M1	€66,000	€77,600	€89,200
M2	€88,200	€103,800	€119,400
M3	€98,000	€115,300	€132,600
M4	€108,800	€128,000	€147,200
M5	€120,900	€142,300	€163,600

Of course, this still isn’t the final salary band structure.

Reward teams would typically review the progression between levels and apply smoothing where needed to avoid awkward jumps, overlaps, or inconsistencies between bands.

That’s exactly the kind of compensation design logic ChatGPT does not automatically factor in.

It can generate a salary range, but it does not guide you through the decisions needed to turn reliable benchmarks into usable, defensible salary bands.

As for the accuracy of the benchmarks themselves, if we compare the ChatGPT vs Ravio data side-by-side, again we can see that the lower levels look broadly plausible.

AI tool generated salary benchmarks vs Ravio's reliable salary benchmarks

ChatGPT's P1 and P2 ranges read €50k to €65k and €65k to €85k, which is similar to the bands made using real-time benchmarks: €49,500 to €66,900 for P1 and €57,500 to €77,900 for P2.

But the P5 ranges? Completely off mark. Where ChatGPT’s pay band for P5 is €145k to over €200k, the market reality sits between €92,800 to €125,600 for a P5 Data Engineering role in Germany.

The gap is significant:

ChatGPT's minimum (€145k) is already €19,400 higher than Ravio's maximum (€125.6k).
At the top end, ChatGPT's €200k estimate is €74,400 higher than Ravio's maximum.
The ChatGPT range is roughly 15%–60% higher than the Ravio-informed band

If you were to use ChatGPT to build salary bands, you could end up positioning a P5 Data Engineer as if they were a much more senior or differently scoped role.

That can easily lead to inflated salary bands, higher payroll costs, pay compression between levels, and compensation decisions that are difficult to justify against the market.

Long story short, ChatGPT can return figures that look plausible for some roles, levels, and locations where more public salary data exists.

But what it returns isn't a benchmark – it's a pattern match on whatever public data it was trained on. There's no way to verify how current that data is, which companies it reflects, or how the figures were derived.

And at senior levels, in specialist roles, or in markets with thinner public coverage, that distinction shows heavily in the numbers you receive.

Result: Relying on AI tools for salary benchmarking can lead to costly compensation mistakes

Using ChatGPT for salary benchmarking gives you figures with no methodology behind them, no peer group definition, no market filters, and no percentile anchoring.

At junior levels for common roles, where public salary data is more plentiful, those figures might land surprisingly close to the market – close enough to feel credible.

But the less consistent the public data for a role – whether that's sparse coverage for senior or specialist functions, or wildly divergent figures for emerging titles and less-covered markets – the further the output drifts from reality.

Rely on that data for compensation decisions and structures, and you’ll quickly find yourself overpaying for roles where the figures skewed high, losing candidates where they skewed low, and building salary bands that don't hold up when scrutinised – by a hiring manager, your finance team, or an employee who's done their own research.

Need help making the business case for buying comp benchmarks? ROI of reliable compensation benchmarks: How to justify the investment to leadership

4 reasons ChatGPT salary benchmarking is unreliable

ChatGPT salary benchmarking can feel fast, easy, and surprisingly convincing.

Even the output can look credible while the underlying benchmark logic is flawed – because ChatGPT isn’t a benchmarking tool, it’s just aggregating publicly available data to give you an answer that feels confident.

All of which makes it difficult to confidently rely on ChatGPT for salary benchmarks and make high-stakes compensation decisions:

Weak and unverified compensation data
Lack of market specificity for real hiring scenarios
Inconsistent job levelling and role comparability
Limited transparency into how benchmarks are generated and verified.

Let’s take a closer look at each.

1. Unreliable data sourcing

ChatGPT pulls pay data from publicly available sources, such as job boards, Glassdoor, ungated industry-specific compensation surveys, and salary ranges shared in job postings on platforms like Indeed and LinkedIn.

The problem with this? Free salary data isn’t benchmarking data. It’s rarely verified, standardised, or consistently updated – making it risky to rely on it for compensation decisions.

For instance:

Glassdoor crowdsources salary data from website users, with limited visibility into how submissions are verified. Even small reporting inconsistencies around compensation, job level, or location can distort salary averages over time. Published job ranges are also typically the average of all salaries ever submitted for that role, so there’s no way to know if the data reflects the current market.
LinkedIn, Indeed, and other job board salary ranges are often based on small sample sizes that are not statistically representative of the market. There’s also little on companies’ compensation philosophy and target percentile, so there’s no way to tell what a published salary range actually reflects.
Publicly available industry compensation surveys are also based on self-reported, unstandardised data, making accuracy doubtful. Plus, data per role is limited, making it difficult to create reliable compensation benchmarks.

So even though a quick prompt gives you a salary range, you still lack the context needed to make fair, consistent, and defensible pay decisions:

How large and representative is the sample size?
Does the data reflect current hiring activity?
Is the benchmark mapped to comparable job roles?
What percentile are contributing companies targeting?
Are equity, bonuses, or location adjustments included?

This creates a major data reliability problem.

Meaning: AI tools can generate salary estimates quickly, but speed doesn’t make the underlying data reliable enough for pay decisions – increasing the risk of overpaying, underpaying, or creating inconsistent salary bands across teams.

2. Lack of market specificity

Compensation benchmarking is highly context-dependent.

A salary range is only useful if it reflects the specific market you’re hiring in, the type of company you’re benchmarking against, and the full compensation package attached to the role.

That means accurate benchmarks often need context around:

City-level market differences
Remote versus office-based pay
Industry-specific compensation patterns
Company stage and size
Role specialisation
Equity, bonus, and benefits structures.

This is another area where ChatGPT benchmarks become unreliable.

ChatGPT’s data source – any old publicly available free salary data – is often too broad to capture these differences properly.

For example, a generic benchmark for “Software Engineer in the UK” doesn’t reflect what a VC-backed AI company in Cambridge needs to pay to compete for a top senior AI infrastructure engineer.

The same issue becomes more obvious when benchmarking:

Emerging roles with limited historical benchmark data available
Niche technical positions where responsibilities vary significantly between companies
AI and machine learning specialisations that evolve faster than public salary datasets can keep up with
Region-specific hiring markets where compensation can differ substantially between cities and talent hubs
Senior leadership roles where compensation packages are often highly customised
Startup roles where equity or variable pay make up a significant portion of total compensation, making base salary alone misleading.

Broad public salary averages are rarely specific enough to support accurate compensation decisions.

3. Inconsistent (or non-existent) job levelling

Salary data is only meaningful when you’re comparing equivalent roles, responsibilities, and seniority levels across companies.

One company’s “Senior Product Manager” may operate at another company’s mid-level scope – making consistent job levelling a critical part of accurate compensation benchmarking, and another major limitation of ChatGPT benchmarks.

Public job titles alone do not provide enough context for accurate benchmarking.

And AI tools cannot reliably infer seniority, scope, or responsibility from job titles only.

So you’ll need to provide ChatGPT with detailed internal context around your:

Job architecture and level frameworks
Standardised job titles and role definitions
Clear responsibilities mapped to each level
Location information.

The challenge here is that many companies don’t have perfectly standardised job levelling internally to begin with – especially fast-growing companies with evolving team structures.

And even when they do, ChatGPT still relies heavily on the quality and consistency of the information you give it.

At the same time, the public salary data ChatGPT pulls from often lacks consistent and verified job levelling itself, making it difficult to confirm whether external salary benchmarks truly reflect the comparable roles and seniority levels in your internal structure.

This becomes even harder for roles that don’t map cleanly to standard market benchmarks, such as:

Hybrid roles with responsibilities spanning multiple functions
Roles at fast-growing startups where job scope and levelling are still evolving
Emerging or poorly defined positions with limited benchmark data available
Cross-functional roles that don’t align neatly with standard market job titles.

In these situations, AI tools have an even harder time identifying truly comparable market roles, increasing the risk of benchmarking the wrong role or seniority level (even if the generated salary range appears accurate at first glance).

Then there’s another practical concern.

As you feed internal job architecture and employee pay context into AI tools, company AI usage policies and data controls can create additional security, privacy, and governance concerns around sensitive pay data.

Put simply, what looks like a quick and free way to benchmark new roles can quickly become costly, unreliable, and difficult to defend – and even risky from a security and data governance perspective.

4. Lack of methodology transparency

When making compensation decisions, you need more than a salary number alone.

You need to understand where the benchmark came from, how the data was collected, and whether the market comparison is actually reliable.

That’s what allows compensation teams to confidently explain – and when needed, defend – salary bands, pay adjustments, and hiring benchmarks.

But AI-generated salary benchmarks don’t provide that level of transparency.

With ChatGPT-generated benchmarks, there’s limited visibility into:

Where benchmark data comes from
How benchmarks are calculated
Whether the data is statistically reliable
Whether roles are mapped consistently
Whether benchmarks reflect the current hiring market.

And even if you ask ChatGPT to explain how a benchmark was generated, it’ll still answer using the same publicly available and often unverified data sources.

The result? Compensation teams can end up making high-stakes pay decisions without a clear way to independently verify or justify the underlying market data.

AI tools for salary benchmarking vs real-time benchmarking platforms

A far more reliable alternative to AI tools for compensation data: real-time benchmarking tools that source data from integrations with contributing companies’ HR systems.

Where AI tools are designed to sound helpful, purpose-built salary benchmarking platforms are designed to support accurate, explainable, and defensible compensation decisions.

That’s the core difference.

AI tools can generate quick salary estimates using publicly available information online.

But compensation benchmarking companies are built specifically to solve the operational challenges compensation teams face around benchmark reliability, market comparability, job levelling, and pay transparency.

Here’s the overview:

AI salary benchmarking (e.g. Claude, ChatGPT, Gemini)	Real-time salary benchmarking tools (e.g. Ravio)
Use unstandardised and unverified publicly available salaries on the internet to generate pay benchmarks.	Use live HRIS integrations to generate more accurate, continuously updated compensation benchmarks.
Lack transparent data sourcing and verification methodology.	Provide clear visibility into data sources, market coverage, and benchmark methodology (specifics depend on the salary benchmarking tool)
Generic benchmarks that miss city-level, industry, and company-stage nuances.	Granular filtering across location, company size, industry, stage, and compensation structure.
Struggle to accurately interpret internal job levelling and role scope.	Human-led job mapping against a defined job catalogue and level framework, for accurate like-for-like benchmarking.
Limited visibility into benchmark quality or confidence.	Providers like Ravio offer benchmark confidence indicators, sample sizes, and methodology transparency.
Difficult to independently validate or defend internally.	Designed to support explainable and defensible compensation decisions.

AI salary benchmarking

(e.g. Claude, ChatGPT, Gemini)

Real-time salary benchmarking tools (e.g. Ravio)

Use unstandardised and unverified publicly available salaries on the internet to generate pay benchmarks.

Use live HRIS integrations to generate more accurate, continuously updated compensation benchmarks.

Lack transparent data sourcing and verification methodology.

Provide clear visibility into data sources, market coverage, and benchmark methodology (specifics depend on the salary benchmarking tool)

Generic benchmarks that miss city-level, industry, and company-stage nuances.

Granular filtering across location, company size, industry, stage, and compensation structure.

Struggle to accurately interpret internal job levelling and role scope.

Human-led job mapping against a defined job catalogue and level framework, for accurate like-for-like benchmarking.

Limited visibility into benchmark quality or confidence.

Providers like Ravio offer benchmark confidence indicators, sample sizes, and methodology transparency.

Difficult to independently validate or defend internally.

Designed to support explainable and defensible compensation decisions.

The difference isn’t about whether you can confidently trust, validate, explain, and defend the benchmarks behind your very real pay decisions.

Compensation benchmarking needs more than web-scraped salary averages

The real issue with AI-generated salary benchmarks is that they simplify a process that’s deeply dependent on market context, role comparability, compensation structure, and benchmark methodology.

And that becomes risky when compensation decisions influence hiring, retention, salary banding, payroll costs, and pay transparency compliance.

Because rather than generating a salary number quickly, effective compensation benchmarking depends upon understanding whether the benchmark is reliable enough to support real pay decisions.

If you’re looking for reliable alternatives to AI benchmarking, we’ll leave you with our guide on the best (and worst) salary benchmarking tools in 2026.

FAQs

Is ChatGPT good for compensation benchmarking?

No, ChatGPT isn’t reliable for high-stakes compensation decision-making. Because it relies on publicly available salary data that is often unverified, outdated, based on averages, and lacks proper job levelling, it can’t help you make real compensation decisions.

How accurate is ChatGPT for salary benchmarking?

ChatGPT salary benchmarking accuracy depends entirely on the quality of the public salary data available online. Because much of this data is self-reported, broad, and inconsistently levelled, AI-generated salary ranges can appear credible while still being inaccurate, outdated, and irrelevant to the specific roles, company stage, team structure, or compensation model you’re benchmarking.

Is AI reliable for compensation decisions?

Because compensation decisions require reliable market data, accurate job levelling, and transparent benchmark methodology, AI tools are currently unreliable for salary benchmarking. If anything, AI-generated salary benchmarks are difficult to validate, explain, and defend internally.

What’s the best AI tool for salary benchmarking?

There’s currently no standalone generative AI tool that fully replaces real-time compensation benchmarking platforms. Where AI tools source inaccurate, unverified data from publicly available sources, tools like Ravio and Pave use HRIS integrations to source accurate and up-to-date total rewards salary data that’s mapped to a consistent job architecture.

Can ChatGPT create compensation ranges?

Yes, ChatGPT can generate compensation ranges using publicly available salary information online. However, while AI tools can automate some compensation workflows, the data behind the salary ranges they build often lacks reliable context on job levelling, company stage, location, compensation structure, and benchmark methodology—making them risky for real pay decisions.

Does ChatGPT understand job levelling?

No, ChatGPT does not inherently understand internal job levelling structures. Its salary estimates rely heavily on publicly available salary data, where job titles alone rarely provide enough context around role scope, responsibilities, or seniority levels to support accurate benchmarking without additional structured input and validation.

What are the risks of using AI for salary benchmarking?

The biggest risks of using AI for salary benchmarking include benchmarking the wrong roles or seniority levels, overpaying or underpaying employees, and struggling to confidently explain or defend compensation decisions internally. AI-generated salary ranges can look accurate while still being based on flawed role comparisons, outdated market data, or incomplete compensation information.

Why do HR teams still need compensation software in the AI era?

HR and compensation teams still need dedicated compensation software because benchmarking requires more than broad salary estimates. Compensation platforms provide trustworthy market data, transparency on data sourcing and verification methodologies, market filtering, job levelling workflows, and support to build dynamic salary bands needed to make accurate and explainable pay decisions.

Can ChatGPT benchmark salaries by company stage?

ChatGPT can attempt company-stage salary benchmarking, but public salary data rarely contains enough structured information about funding stage, company maturity, or compensation philosophy to generate consistently reliable benchmarks. This becomes especially difficult for startups, emerging roles, and niche hiring markets.

Can AI help me do benchmarking faster?

Yes, AI tools can speed up early-stage salary research, summarisation, and benchmark comparisons. But faster benchmarking does not automatically mean more accurate benchmarking, because effective compensation decision-making still relies on pay benchmarks that reflect the specific roles, locations, and hiring markets you’re benchmarking for.

Get the Compensation Review straight to your inbox

Your monthly dose of market insights and expert perspectives

BenchmarkingReward hours

ChatGPT for salary benchmarking: How reliable is it?

Contents

Testing AI-generated salary benchmarks vs Ravio’s HRIS-integrated benchmarks

Prompt 1: How much should we pay a Senior Product Designer in Estonia?

Prompt 2: Can you build me a set of salary ranges for Data Engineering in Germany?

Result: Relying on AI tools for salary benchmarking can lead to costly compensation mistakes

4 reasons ChatGPT salary benchmarking is unreliable

1. Unreliable data sourcing

2. Lack of market specificity

3. Inconsistent (or non-existent) job levelling

4. Lack of methodology transparency

AI tools for salary benchmarking vs real-time benchmarking platforms

Compensation benchmarking needs more than web-scraped salary averages

FAQs

Is ChatGPT good for compensation benchmarking?

How accurate is ChatGPT for salary benchmarking?

Is AI reliable for compensation decisions?

What’s the best AI tool for salary benchmarking?

Can ChatGPT create compensation ranges?

Does ChatGPT understand job levelling?

What are the risks of using AI for salary benchmarking?

Why do HR teams still need compensation software in the AI era?

Can ChatGPT benchmark salaries by company stage?

Can AI help me do benchmarking faster?

Get the Compensation Review straight to your inbox

You might also like

What reward can (and can't) do for retention

Benchmarking the edges: niche roles, new titles, and limited market data

The best (and worst) tools for salary benchmarking

Stay in the loop with the latest insights, trends, and compensation guidance from Ravio

Product

Resources

Company

Legal

Solutions

Compare

Learn