DEV Community

Cover image for Found 897 Fake Followers on DEV.to Here's How I Proved It

Found 897 Fake Followers on DEV.to Here's How I Proved It

GnomeMan4201 on May 25, 2026

A full technical audit of a coordinated follower inflation network — methodology, findings, and a detection rule simple enough to run in one query....
Collapse
 
ben profile image
Ben Halpern

Thanks for this and your email, this is constantly something we're trying to improve and any help in squashing these rings is appreciated

Collapse
 
gnomeman4201 profile image
GnomeMan4201

I appreciate you taking the time to review it.

Hopefully the data helps

Collapse
 
toxy4ny profile image
KL3FT3Z

Fantastic work - especially the graph analysis and the HMAC reverse. I want to suggest one more dimension for your next investigation that flows directly from your findings.

You showed that upvote.club sells not just follows, but also comments, likes, and unicorns. This means a second layer of infrastructure may already be active beneath "good" articles on DEV.to - AI agents writing templated comments to fulfill orders for "meaningful engagement."

Here's what I'd add to your toolchain:

1. Comment text pattern analysis

  • Your fake followers are following=1, zero articles. But what if some of them (or a parallel cohort) have already graduated to the "earn points by commenting" phase?
  • upvote.club explicitly asks for "meaningful comment text" - and that's the key phrase. In their context, "meaningful" means "not spam that immediately triggers moderation." So the comment must be grammatically correct, topically relevant- and yet completely meaningless in terms of real dialogue.
  • The perfect job for an LLM agent: read the article, generate 2-3 paragraphs of generic phrases, drop in a "thanks for sharing" and a "looking forward to more."

2. A graph signal for comments - the Following=1 equivalent

  • You had Following=1 as an invariant for followers. For commenters, the analogous invariant could be: an account that left a comment on an article, but has articles=0, comments_count=1 (or very low), and its only comment is on an article that's active in the upvote.club queue.
  • If the extension intercepts POSTs to /comments (you confirmed this in the static analysis of social/devto.js), then comments are a full attack vector, no less automatable than follows.

3. What to look for in the text

  • Perfect grammar + zero specificity. Real developers write with typos, abbreviations, and references to their own experience. LLM agents generate "smooth" text without a single concrete detail from the article.
  • Recurring n-grams or structural templates. For example: "This is a great deep dive into [topic]. I particularly appreciated how you explained [generic concept]. Have you considered [vague suggestion]? Looking forward to your next post!" - the skeleton may vary, but graph-wise it will cluster.
  • Temporal clustering. If 15 comments appear under an article within 24 hours, and 8 of them are from accounts with joined_at in the same wave (your November/May cohorts) - that's not hype, that's fulfillment.

4. The scariest part - recursive hallucinations

  • You mentioned that DEV.to is a community. But if an AI comment appears under an article, and then another AI agent (possibly from the same upvote.club or just an automated bot) replies to it — you get a hallucinatory dialogue. Two agents exchange gratitude and generic phrases, creating the illusion of a "lively discussion" under the article.
  • This is worse than fake followers. Followers are a metric. Comments are semantic noise that pollutes the platform's information space. A reader sees "20 comments," opens the thread, and finds 15 automated "great post" messages and 5 recursive replies between bots.

5. Technical implementation

  • Your score_account() could be extended to score_commenter():
  def score_commenter(profile, comment_text):
      score = 0
      if profile['articles_count'] == 0: score += 1
      if profile['comments_count'] <= 2: score += 1
      if profile['joined_at'] in known_bot_waves: score += 1
      if generic_praise_ratio(comment_text) > 0.8: score += 1
      if no_specific_references(comment_text, article_content): score += 1
      return score
Enter fullscreen mode Exit fullscreen mode
  • For template detection - simple TF-IDF + cosine similarity between comments from the same author or the same wave. Or even perceptual hashing, but for text: signature sentence hashes.

6. Why this matters right now

  • You discovered that upvote.club sells comments. You confirmed that the extension intercepts POSTs to /comments. You showed that the API is fully replayable.
  • The logical next step: if a customer buys "DEV.to Comments" and the executor is a human with the extension installed, which can autonomously execute tasks (you captured this live - task 64918 completed without interaction), then comments could be generated not by a human, but by a script that calls an LLM and posts the result via the replayable API.
  • upvote.club doesn't verify what is written in the comment - they only verify that a POST request happened. This means the comment quality can be absolutely zero, and the platform will still count it as completed.

TL;DR for your next project:

  • Scrape comments under articles that were simultaneously in the upvote.club queue (you have the API for this).
  • Check if any accounts with following=1 (your fake followers) suddenly started commenting.
  • Look for text clusters — recurring templates, "meaningful" generic phrases, absence of specificity.
  • If you find recursive dialogues between suspicious accounts - that's another graph invariant, no weaker than Following=1.

Your methodology with score_account and graph invariants maps perfectly onto this vector. Followers are volume. Comments are informational environment toxicity. And if the follower botnet is already proven, the comment botnet is just the same extension hitting a different endpoint.

Thank you for open-sourcing the toolchain and for publishing the methodology, not just the results. That makes the ecosystem more resilient.

Collapse
 
gnomeman4201 profile image
GnomeMan4201 • Edited

I really appreciate you commenting on this and adding to the conversation. It's rare to have any dialogue about this niche area.

The comment vector framing is dead on and the infrastructure confirmation goes deeper than what I published.

On the extension intercepting /comments POSTs confirmed, but the scope is wider than DEV.to. The Helper App (fkiaohmeeoiipoknngcppjbkinaamnof) intercepts LIKE, UNICORN, SAVE, BOOKMARK, FOLLOW, and COMMENT across 20+ platforms through the same C2 at api.upvote.club. Instagram and Threads get GraphQL hooks running in MAIN world not DOM manipulation, actual page-level API interception. The task loop is fully documented: C2 delivers taskId + completionToken via URL params, content script executes the action, HMAC-SHA256 signature built from ct+taskId+action+timestamp proves completion, POST to /api/complete-task/{taskId}/. The whole thing is replayable without a human in the loop.

On the two-vector separation I published that framing may be wrong. 58% of MD5 avatar cluster groups contain BOTH pure bot accounts (hash-generated usernames, zero activity) AND confirmed upvote.club engagement farm participants. Same image source feeding both populations. The parsimonious explanation is one account factory with two deployment modes: some accounts get handed to the engagement farm as "real users," some get deployed as pure follow bots. The line between the two is less clear than the platform claims.

On the self-boosting angle there's a specific instance I haven't published yet. They're running their own Helper App extension through their own CHROMESOCIAL network (product ID 46) to inflate its own install count and reviews on the Chrome Web Store. The operation is farming itself. That's a direct CWS policy violation on top of everything else, and it's verifiable from the product catalog.

On the full circle architecture here's the part that reframes the whole thing. The Helper App includes a captureVisibleTab handler that screenshots whatever tab is active during verification and uploads it to /api/social-profiles/upload-verification-screenshot/ authenticated with a hardcoded EXTENSION_SCREENSHOT_UPLOAD_SECRET in background.js. They built covert surveillance into their own engagement farm to police their own members. The fraud detection mechanism is itself a privacy violation. Anyone who installed this extension to earn points handed the operator a silent screen capture capability. I logged the endpoint, caught it live, and confirmed it: HTTP 401 without auth, HTTP 200 with valid JWT plus the hardcoded secret. The secret is in every CRX that's been distributed. That's the trust hierarchy operator surveils members, members surveil platforms, platforms surveil users. Three layers of covert observation stacked on top of each other.

On comment text patterns and the LLM pipeline their own blog posts document it explicitly. Post 170 and 199 describe an AI drafting pipeline for Quora: AI writes natural non-promotional replies designed to pass moderation, aged accounts post them, community upvotes them, deleted answers get reposted with modified text to evade re-detection. That's not speculation that's their published product description. The meaningful_comment handler in devto.js is the same pipeline hitting a different endpoint.

On active platform security evasion this is where it crosses a line blog post 169 is an operator admission I haven't published yet. Direct quote from their own content: when someone completes a GitHub task through the extension, GitHub sees Google as the referrer. Not their platform. Not an unknown source. Google. That's deliberate header spoofing to circumvent GitHub's fraud detection systems. That's not engagement farming. That's active evasion of platform security infrastructure, documented in their own words. Combined with the Quora answer restoration pipeline deleted content reposted with modified text to evade re-detection and the 17-platform coordinated comment injection system, these are operator admissions of systematic detection evasion across multiple platforms simultaneously.

On the recursive hallucination model this is the thread I want to pull hardest. You're right that it's worse than fake followers. A follow is a vanity metric. A comment thread between two agent-generated accounts under a real article is synthetic discourse injected into a community's knowledge base. The detection invariant you proposed....accounts with articles=0, comments_count=1, only comment on an article active in the upvote.club queue is exactly the right shape. I have the API access to run that query. The temporal clustering angle closes the gap: if 8 of 15 comments under an article within 24 hours come from accounts in the same joined_at wave, that's not hype, that's fulfillment. That's a graph invariant I haven't formalized yet but should.

On operator infrastructure and attribution I went down every thread I could find and link together. Domain registration chains, git commit metadata, Firebase UIDs, Discord IDs, blog post authorship, product catalog ownership, MX record overlap, Yandex Metrica tag sharing, S3 bucket fingerprints, Wayback Machine snapshots, employer records recovered from commit history across repos spanning nearly a decade. I mapped the entire organization from the inside out legal entity, infrastructure stack, named individuals, development environment, side operation vs day job. The person at the head of this is genuinely technical. This isn't a marketer who hired a developer. The architecture of the extension, the HMAC signing scheme, the C2 design, the multi-platform GraphQL hooks whoever built this knows exactly what they're doing. The hardcoded ngrok tunnel in the production build pointing to a Vultr VPS dev environment was the opsec failure that confirmed it. The version bump from 1.1.26 to 1.9.43 post-disclosure had zero code changes identical background.js, identical hardcoded secret, identical surveillance mechanisms. They saw the disclosure and changed the version number. Nothing else. That tells you everything about how they assessed their exposure.

The score_commenter() sketch you wrote fits the existing codebase almost exactly as written. Part 3 is already outlined. The recursive hallucination section is its own threat model, not a footnote and im curious what you think.

Collapse
 
toxy4ny profile image
KL3FT3Z

Good data on the 58% overlap! unified pipeline with JIT routing makes more sense than two silos. Confirms S3 sequencing and avatar provenance are cross-branch invariants, which simplifies detection logic.
The CWS self-boost loop via CHROMESOCIAL (product ID 46) is a separate report to Google. Shadow product farming its own install count through the same extension that passed review that's a CWS trust model failure, not just a policy violation.
On the surveillance hierarchy: you named three layers, there's a fourth. means the operator collects on every site the member visits, not just task targets. 2,000 browsers as passive SIGINT platform, funded by microtask economics. The hardcoded screenshot secret in every distributed CRX means anyone who installed prior to disclosure has a known-static credential potentially exposing their full browsing session. That's not C2 architecture, that's bulk collection.
Posts 170/199 confirm the adversarial loop: generate → post → detect → evade → regenerate with semantic distance checks. The Quora → DEV.to pipeline transfer means meaningful_comment is a standardized LLM interface, not a one-off. Confirms the handler in devto.js is production code for cross-platform deployment.
Post 169 establishes intent. Operator's own words on referrer spoofing = documented evasion of platform security infrastructure. Published in members-only blog = either arrogance or misjudgment of member loyalty. Neither changes the legal character.
On recursive hallucinations: agree on the invariant shape. I'd add referential thinness as a signal -LLM comments mention "the approach" or "your analysis" without anchoring to specific paragraphs, methods, or findings. Real technical comments cite specifics. Generic praise + zero referential anchors = high-confidence synthetic.
Version bump 1.1.26 to 1.9.43 with zero code changes and identical secret: they assessed exposure as manageable. Implies overconfidence or backup infrastructure. Both are bad for defenders.
Re: Part 3 your API access validates the query. Single instance of joined_at-clustered comments under an active queue article = proof of concept for recursive hallucination model. The score_commenter() sketch maps directly to your existing codebase.
One question: with your attribution depth (legal entity, named individuals, infrastructure stack), have you evaluated the evidence preservation boundary? Platform reports and CWS disclosure cover the technical side. Operator blog posts contain admissions that may support law enforcement interest. At what point doesn’t evidence?

Thread Thread
 
gnomeman4201 profile image
GnomeMan4201

Evidence preservation boundary ..yes, I've evaluated it. Everything is in a public versioned repo: 36 timestamped commits, raw authenticated API captures, full decompiled extension source, 688KB product catalog JSON, and operator blog post admissions all committed before the storefront went offline May 26. The commit timeline predates the operator response, so the preservation is independently verifiable, not just my word. AKI GDPR complaint is filed. My honest hesitation on LE has been whether a solo independent researcher gets taken seriously by a foreign DPA. But looking at what's actually in the repo, this isn't a tip it's a case file. The FTC is also in scope given ~4,000 US installs and undisclosed surveillance collection under a misleading listing. I've sat on the LE referral question. Your question is making me reconsider that

Thread Thread
 
gnomeman4201 profile image
GnomeMan4201

One addition to the evidence preservation point .the investigation is still active. The backend is live as of today, API still serving ~4,000 extension users, CRX unchanged at v1.1.26 since May 24. Longitudinal monitoring has been running continuously since disclosure. The case file isn't frozen , it's still writing

Thread Thread
 
toxy4ny profile image
KL3FT3Z

Good chain of custody. 36 timestamped commits predating operator response = independent verifiability, not just researcher credibility.
On LE hesitation: solo researcher concern is valid, but the repo structure changes the calculus. This isn't a tip requiring DPA resources to investigate it's a pre-built case file. DPA or FTC can intake this directly and issue enforcement without preliminary phase. The 688KB product catalog + decompiled source + live captures = evidence package, not leads.
FTC is the sharper angle here. ~4K US installs, undisclosed collection under misleading CWS listing = Section 5 violation with precedent (Stylish, Hover Zoom). FTC doesn't need to prove harm to individual users - deceptive practices in data collection are sufficient. GDPR complaint covers EU installs. Two jurisdictions, one evidence package.
On storefront going offline May 26: operator response confirms they assessed exposure. But offline storefront ≠ destroyed evidence. Your commit timeline captured everything pre-response. Their takedown is actually corroborating evidence - flight response after disclosure.
On the hesitation itself - I get it. Solo researcher, impostor syndrome, self-checking instead of acting fast. Same here. The difference between a tip and a case file is what lets you push past that. Your repo is the case file. You don't need to be right about everything you need to be right about enough, and the evidence speaks for itself. Send it.

Collapse
 
yune120 profile image
Yunetzi

Fake followers pollute the signal. Time for verifiable identities and public audits so real engagement actually counts.

Collapse
 
gnomeman4201 profile image
GnomeMan4201 • Edited

Agree about the signal problem, but I don’t want my identity and data sitting with every platform’s security team just to participate online more than I already have to. Forced verification isn’t a complete solution. We should also be teaching people what manipulation looks like and how to see through coordinated noise and fake engagement

Collapse
 
taqui profile image
Taqui

i have more than 20k 😭

Collapse
 
gnomeman4201 profile image
GnomeMan4201

Somewhere there’s a server farm that’s a big fan of your work.

Collapse
 
sirinivask profile image
Srinivas Kondepudi

Glad Im reached end of the article :)

but the findings are real, great work really, thanks for this :)

Collapse
 
gnomeman4201 profile image
GnomeMan4201

Thanks, I appreciate you taking the time to read through it. I know it is a long read.
I wanted to make sure the claims were backed by reproducible evidence instead of assumptions.

Collapse
 
tracygjg profile image
Tracy Gilmore • Edited

I have long stopped paying any attention to the number of followers when I realised many of them were linked to gambling sites in the Far East and had no bio recorded.

Goodhart's law is an adage that has been stated as,

"When a measure becomes a target, it ceases to be a good measure".

There is also the fact that any measure is bound to be gamed eventually, but I don't know of a quote for that.