
Citation metrics such as H-Index offer a quick snapshot of scholarly influence, but snapshots can blur or even distort the underlying picture. The same numbers that capture genuine engagement can also mask superficial volume, recycled ideas, or tactical self‑promotion. Metrics alone cannot reveal whether a body of work has advanced the conversation or simply mastered the rules of the game.
Jorge E. Hirsch, the physicist who invented the h-index, has grown deeply critical of its misuse. Reflecting on its impact, he writes:
“I proposed the H-index hoping it would be an objective measure of scientific achievement. By and large, I think this is believed to be the case. But I have now come to believe that it can also fail spectacularly and have severe unintended negative consequences. I can understand how the sorcerer’s apprentice must have felt.”
Despite this warning, academia continues to worship at the altar of a number that was never designed to bear so much weight.
When careers, promotions, and grants hinge on numbers like the H-Index, it becomes tempting to reach for shortcuts: relentless self‑citation, ever‑expanding author lists, tight cross‑citation circles, survey‑paper swarms, and other inflation tricks. Even well‑intentioned researchers can drift into these habits without realising.
This guide is a frank checkpoint. It will help you:
- Detect early signs of citation inflation in your own profile.
- Benchmark your metrics against researchers whose integrity you admire.
- Use data‑driven tools to decide what the numbers really mean for you and whether they still reflect the scholarly contribution you set out to make.
H‑Index and Total Citations Don’t say much
Relying solely on your h‑index or raw citation totals is like judging a book by the thickness of its spine: you get volume without context. A towering h‑index can still mask critical nuances:
Authorship Inflation: Multi‑author papers can rocket citation counts while diluting individual contribution. The h‑index credits every co‑author equally, whether they led the work or merely lent a dataset.
Citation Distribution: A handful of blockbuster papers can prop up both metrics while the rest of your portfolio languishes. Integrity lies in the long tail, not just the headline numbers.
Self‑Citation & Cartels: Both metrics count every citation the same, even when you are citing yourself or trading citations within a tight circle of collaborators.
Time Dynamics: Neither metric tells you whether your influence is growing, plateauing, or fading. Age‑weighted indices and annualised growth rates reveal these trends.
In this article I will introduce a set of more analytical metrics along with the tools needed to extract them that dig beneath the surface of your publication and citation data. These metrics go beyond the simplicity of raw counts to reveal patterns in authorship, citation timing, publication types, and collaboration structures. Taken together, they offer a more nuanced and honest picture of your academic footprint that can expose early signs of citation inflation, authorship dilution, or other forms of imbalance. If the h-index tells you how much you’ve been cited, these tools help you understand how, why, and whether those citations reflect genuine impact.
Using Analytical Metrics to Detect Integrity Risks
No single metric tells the whole story but used in combination, these indicators can surface patterns in your profile that hint at inflation, imbalance, or genuine influence. Here’s how specific metrics map to key integrity risks.
1. Authorship Inflation
Publishing frequently is not the same as contributing meaningfully. If your name appears on many papers with long author lists, but your individual input is limited or unclear, your citation metrics may look stronger than your actual contribution. Authorship inflation is common in large research groups or among collaborators who routinely include each other for strategic reasons. To detect authorship inflation, you can use the following metrics in combination:
Key Metrics to identify Authorship Inflation & How to Use Them
- Authors_Paper (Average number of authors per paper): High values (depending on the branch of science norms) suggest frequent participation in large collaborations, which may dilute your individual contribution.
- Papers_Author (Average number of papers per unique co-author Formula i.e., Total Papers / Unique Co-authors Interpretation): A low ratio indicates your publication output is spread across many co-authors — a sign that your contribution per paper might be minimal.
- hI_index (Individual h-index adjusted for co-authorship Formula i.e., h / Avg. Co-authors in h-core): Adjusts your h-index downward if your most cited papers have large teams. More reflective of individual impact.
- hI_norm (Normalized h-index using fractional counting Approach i.e., Each paper contributes 1 / number of authors). Useful for normalizing co-authorship. This metric helps compensate for large co-author teams by giving a more realistic picture of your individual contribution. It prevents inflated credit when you’re consistently one of many names on high-impact papers. If your hI_norm is significantly lower than your raw h-index, it may indicate that much of your citation impact comes from heavily co-authored work.
- hm_index (Harmonic authorship-adjusted h-index Logic i.e., Find the largest h such that the sum of 1 / authors_i for the top h papers is still ≥ h): Penalizes large co-author counts more precisely than hI_index. This metric refines the h-index by incorporating the number of co-authors for each paper using harmonic averaging. Specifically, it finds the largest value of h such that the sum of the inverse of the number of authors for your top h papers is still ≥ h. Unlike hI_index, which uses a single average co-authorship value for adjustment, hm_index applies the adjustment to each paper individually. This allows it to more accurately penalize papers with large author lists while preserving credit for solo or small-team work. If your hm_index is much lower than your raw h-index, it suggests that a significant portion of your citations come from large, heavily co-authored papers. A smaller gap between h and hm indicates stronger individual authorship impact.
- hA (Authorship distinctiveness index Purpose): Detects repetitive collaboration with the same close circle. Low values suggest closed authorship networks. This metric measures how diverse your co-authorship network is within your h-core (i.e., the set of papers used to calculate your h-index). It captures whether you’re consistently collaborating with a small, closed group or working across varied teams. The index analyses how frequently the same co-authors appear across your most cited papers. A low hA value means you are repeatedly publishing with the same people, which could suggest an insular or strategically reinforced citation network. A high hA value indicates that you are collaborating with a wider, more diverse set of researchers. If your hA is low, even with a strong h-index, it may raise questions about the independence and breadth of your scholarly influence.
Example: Researcher A vs. Researcher B
| Metric | Researcher A | Researcher B |
|---------------------|--------------|--------------|
| Total Papers | 30 | 30 |
| Unique Co-authors | 80 | 20 |
| Total Authors | 210 | 90 |
| h-index | 15 | 15 |
| Authors_Paper. | 7.0 | 3.0 |
| Papers_Author. | 0.375 | 1.5 |
| hI_index. | 3.0 | 7.5 |
| hm_index. | 4.2 | 9.1 |
In this example both researchers have the same h-index, but Researcher A relies heavily on large, diffuse collaborations. Their adjusted metrics (hI_index, hm_index) show much weaker individual impact compared to Researcher B, who works with smaller teams and likely contributes more meaningfully to each publication.
2. Citation Concentration
A strong citation count or high h-index may look impressive, but these numbers can be misleading if your impact comes mostly from a handful of “hit” papers (e.g., Survey/Review papers). This is known as citation concentration when a small portion of your work carries most of the weight while the rest receives little attention. To detect this, you need metrics that examine how your citations are distributed across your publications.
Key Metrics & What They Reveal:
- g_index: This is more generous than the h-index and accounts for cumulative citations. It’s defined as the largest number g such that your top g papers have at least g² total citations. A high g_index relative to your h_index suggests that a few highly cited papers are propping up your overall profile.
- e_index: Measures “excess” citations in your h-core. It’s calculated as the square root of the sum of squared differences between each h-core paper’s citations and h. A high e_index means that while some papers in your h-core are cited very heavily, others are just barely making the cut, indicating uneven impact. A high e_index signals uneven citation distribution across your h-core, some superstar papers, some barely scraping by.
- h_coverage: This is the percentage of your total citations that come from your h-core papers. It’s calculated as: h_coverage = Citations in h-core / Total Citations A low value means that your overall citation count is spread across many papers, while a high value may signal heavy reliance on your most cited works.
- g_coverage: Similar to h_coverage, but based on your g-core (i.e., your top g papers). It tells you what portion of your total impact comes from your most influential publications. High g_coverage suggests that your profile depends on a narrow set of outputs.
- star_count: This is the number of your papers that fall into the top 1% by citation count in your field. Having a few “star” papers is normal, but an unusually high dependence on these outliers, especially if the rest of your work is barely cited, can create an inflated perception of consistent influence.
Citation Concentration Comparison Example: Researcher A vs. Researcher B
| Metric | Researcher A | Researcher B |
|------------------|--------------|--------------|
| Total Papers | 30 | 30 |
| h-index | 15 | 15 |
| g-index | 25 | 18 |
| e-index | 45.6 | 12.5 |
| h_coverage | 75% | 48% |
| g_coverage | 90% | 60% |
| star_count | 5 | 1 |
In this example, Researcher A relies heavily on a small set of highly cited papers. Their g-index and e-index are much higher than their h-index, and their h_coverage and g_coverage suggest most of their impact is concentrated in a few papers. Researcher B has a more balanced citation profile, indicating broader, more consistent scholarly impact.
3. Self-Citation & Citation Cartels
Not all citations carry the same weight. If a significant portion of your citations come from yourself or a tight-knit group of collaborators who frequently cite one another, your metrics may be artificially inflated even if unintentionally.
This kind of citation behaviour can distort your perceived impact and raises questions about academic integrity, especially when it becomes habitual or strategic.
Key Checks & Metrics:
- Manual Inspection for Self-Citations: Web of Science (WoS) and Scopus both provide tools to calculate your citation metrics with and without self-citations. In WoS, you can view your Author Record or Citation Report and select the option to exclude self-citations, which updates your citation counts and h-index accordingly. In Scopus, use the Citation Overview tool in your author profile and uncheck the “Include self-citations” box to instantly see adjusted metrics. You can also use the following tool for more in-depth analysis: https://github.com/mattrosenblatt7/self_citation (needs Scopus API Access).
- Cites_Author (i.e., total Citations / Unique Co-authors): If this number is unusually low, it may suggest that your citations are being generated mostly within a small, closed collaboration group.
- Cites_Author_Year (Citations per co-author per year i.e., Total Citations / (Unique Authors × Years)): A high value, especially early in your career, can indicate rapid citation growth fueled by self-citation or reciprocal citing among a small inner circle. Compare this to peers with similar career lengths and publication counts to contextualize it.
| Metric | Researcher A | Researcher B |
|--------------------|---------------|---------------|
| Total Citations | 1,200 | 1,100 |
| Self-Citations. | 420 (35%) | 55 (5%) |
| h-index | 16 | 15 |
| h-index(Excl. Self)| 10 | 14 |
| Cites_Author | 15.0 | 55.0 |
| Cites_Author_Year | 2.1 | 0.6 |
| Unique Co-authors | 80 | 20 |
| Years Active | 8 | 10 |
In this example, Researcher A appears more cited overall, but 35% of those citations are self-citations. When self-cites are excluded, their h-index drops from 16 to 10. Cites_Author and Cites_Author_Year also indicate dense intra-group citation activity, likely from a tight collaborator circle. Researcher B has slightly fewer total citations, but their h-index remains stable when self-citations are excluded, suggesting greater external recognition and a more balanced citation profile.
How to extract these metrics
1- Install and Use Publish or Perish
Publish or Perish is a free tool developed by Anne-Wil Harzing. You can download it from harzing.com.
- Open PoP and select Google Scholar Profile as your data source.
- Enter your name or Google Scholar ID.
- Once data is retrieved, right-click the result and choose “Metrics report” > “Copy full report” or save as a file.
- PoP version used in this guide: 8.18.5091.9307 (ARM AArch64) on macOS.
2- Understand Your Metrics
Here are the core metrics provided by PoP:
| Metric | Equation / Logic. |
|--------------|-----------------------------------|
| Cites_Year | Total_Citations / Years |
| Cites_Paper | Total_Citations / Total_Papers |
| Cites_Author | Total_Citations / Unique_Authors |
| Papers_Author| Total_Papers / Unique_Authors |
| Authors_Paper| Total_Authors_All / Total_Papers |
| h_index | From citation list |
| g_index | Largest g: sum(c₁...c_g) ≥ g² |
| hc_index | cᵢ = C / (age + 1)^γ |
| hI_index | h / Avg_Authors_in_H_Core |
| hI_norm | Each paper counts as 1 / n_authors|
| AW_index | sqrt(SUM(cᵢ / (ageᵢ + 1))) |
| AWCRpA | AWCR / Unique_Authors |
| e_index | sqrt(Σ(cᵢ − h)²), cᵢ in h-core |
| hm_index | Σ(1 / authorsᵢ) ≥ h_m |
| hI_annual | hI_index / Years |
| h_coverage | H_Core_Cites / Total_Cites |
| g_coverage | G_Core_Cites / Total_Cites |
| star_count | # of top 1% cited papers |
| acc1–acc20 | Avg cit 1st,2nd,5th,20th year |
| hA | Authorship distinctiveness |
Always contextualise your metrics within your discipline’s norms. while 50-author papers are standard in experimental physics and biology, computer science’s highest-impact work often features just 2-3 authors. These field-specific collaboration patterns mean authorship-based metrics must be carefully interpreted relative to your particular academic domain to avoid misdiagnosing normal practices as citation inflation.
A Small Self-Assessment Python Script
The best way to interpret your citation metrics is in context. Instead of looking at your h-index or citation counts in isolation, it’s far more meaningful to benchmark yourself against a group of peers—such as researchers in your department, research center, field of study, or even just scholars whose work you admire. To make this easier, I’ve created a small Python script that takes a CSV file containing citation metrics from a group of researchers and calculates averages and quantiles (10th/90th percentiles). It then compares your metrics to these norms and highlights potential anomalies or red flags. You can find the code and instructions in this GitHub repository:
https://www.linkedin.com/embeds/publishingEmbed.html?articleId=8517183463184808724






