Toni Antunovic

Posted on Jun 20 • Originally published at lucidshark.com

10,000 Malicious GitHub Repos: Why AI Dependency Suggestions Are Now a Security Risk

#ai #webdev #devops #security

This article was originally published on LucidShark Blog.

Security researchers recently disclosed a finding that should stop every developer using an AI coding tool in their tracks: more than 10,000 GitHub repositories are actively distributing Trojan malware. The repositories are designed to look legitimate, use real package names, and pass casual inspection. That number matters more than you think, because your AI coding tool has been trained on GitHub.

The Malicious Repo Problem

The campaign documented by researchers operates at a scale the security community has not seen before in a single coordinated wave. The attackers use three overlapping techniques to maximize reach:

Typosquatting: Repositories and packages with names one character off from popular libraries (e.g., reqeusts instead of requests, coloers instead of colors). The human eye skips right over the transposition.
Dependency confusion: Uploading public packages with the same name as internal private packages, exploiting how package managers resolve scope conflicts. If your registry checks public before private, the attacker wins.
Star-farming and social proof: Buying GitHub stars, cloning legitimate README content, and maintaining a convincing commit history. These repos look indistinguishable from healthy open-source projects at a glance.

The Trojan payload is typically embedded in install lifecycle hooks: preinstall, postinstall, or prepare. The moment you run npm install or pip install, the malicious script executes. It may exfiltrate environment variables, establish persistence, or beacon to a command-and-control server. By the time you see unexpected network traffic, the damage is done.

Warning: Lifecycle hooks in npm and PyPI packages execute automatically during installation. There is no confirmation prompt. If a dependency is malicious, the payload runs the moment your package manager resolves it.

Why AI Coding Tools Make This Worse

Here is the uncomfortable truth about how AI code assistants work: Claude Code, Cursor, GitHub Copilot, and every other AI coding tool learned their suggestions from code on the internet, including GitHub. When the model sees your function signature and suggests an import, it is pattern-matching against millions of training examples. Some of those examples imported legitimate packages. Some imported packages that were later compromised. Some imported packages from repos that were malicious from day one.

The problem is not that AI tools are deliberately suggesting malware. The problem is structural:

AI suggestions carry implicit authority. Developers have been trained to trust autocomplete. When the IDE suggests import colorama from 'coloarma', most developers accept it without checking the npm registry manually.
The accept-and-move-on workflow bypasses scrutiny. The entire UX of AI coding tools is optimized for flow state. Tab to accept, tab to accept, tab to accept. Stopping to audit an import breaks that rhythm, and most developers do not stop.
AI suggestions look more authoritative than a Stack Overflow answer. When a human on Stack Overflow suggests a package, you might click the link and check the download count. When Claude Code suggests it inline in your editor, the package name is just... there. Ready to accept.
AI tools do not verify package existence or integrity. None of the major AI coding assistants query the package registry to confirm the suggested package is real, uncompromised, or not a known malicious actor before presenting the suggestion.

Warning: AI coding tools do not perform real-time registry checks. A suggestion that looks like a legitimate import may reference a typosquatted or malicious package that the model learned from compromised training data.

The result is a threat model that did not exist five years ago. Before AI coding tools, developers typed imports manually and were at least slightly more deliberate about what they were adding. The friction was a feature. AI coding tools removed that friction entirely, and in doing so, removed one of the few human checkpoints in the dependency ingestion pipeline.

What AI Suggestions Your Codebase Has Already Accepted

If you have been using an AI coding tool for more than a few weeks, there is a reasonable chance you have accepted at least one dependency suggestion without auditing it. Here is how to find out what is in your codebase:

Step 1: Find recently added imports

`# Find all imports added in the last 30 days via git log
git log , since="30 days ago" , diff-filter=A -p ,  "*.js" "*.ts" "*.py" "*.go"   | grep "^+"   | grep -E "(import|require|from)"   | sort -u`

Step 2: Extract your full dependency list

`# For Node.js projects
cat package.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
deps = {**data.get('dependencies', {}), **data.get('devDependencies', {})}
for pkg in sorted(deps):
    print(pkg)
" > /tmp/my-deps.txt

# For Python projects
pip freeze > /tmp/my-deps.txt`

Step 3: Run SCA against known malicious package lists

`# Check each package against OSV (Open Source Vulnerabilities) database
while IFS= read -r pkg; do
  pkg_name=$(echo "$pkg" | cut -d'@' -f1 | cut -d'=' -f1)
  result=$(curl -s "https://api.osv.dev/v1/query"     -H "Content-Type: application/json"     -d "{"package":{"name":"$pkg_name","ecosystem":"npm"}}"     | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'VULN: {len(d.get("vulns",[]))} findings for $pkg_name') if d.get('vulns') else None" 2>/dev/null)
  [ -n "$result" ] && echo "$result"
done ' -f1 | cut -d'/dev/null)
    if [ "$VULNS" -gt 0 ] 2>/dev/null; then
      echo "BLOCKED: $pkg has $VULNS known vulnerabilities. Review before committing."
      exit 1
    fi
  done
fi

echo "SCA gate: all new dependencies clean."`

Layer 2: GitHub Actions SCA workflow

`# .github/workflows/sca.yml
name: Supply Chain Audit

on:
  pull_request:
    paths:
      - 'package.json'
      - 'package-lock.json'
      - 'requirements*.txt'
      - 'Pipfile'
      - 'pyproject.toml'
      - 'go.mod'
      - 'Cargo.toml'

jobs:
  sca-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install osv-scanner
        run: |
          curl -L https://github.com/google/osv-scanner/releases/latest/download/osv-scanner_linux_amd64             -o /usr/local/bin/osv-scanner && chmod +x /usr/local/bin/osv-scanner

      - name: Run OSV Scanner
        run: osv-scanner , format table .

      - name: Check for AI-introduced packages
        run: |
          # Diff against base branch to find newly added packages
          git fetch origin ${{ github.base_ref }}
          git diff origin/${{ github.base_ref }}...HEAD ,  package.json             | grep "^+"             | grep -E '"dependencies"|"devDependencies"' -A 999             | grep -oE '"[a-z@][a-z0-9\-\./]*":\s*"[^"]+"'             > /tmp/new-packages.txt
          echo "New packages in this PR:"
          cat /tmp/new-packages.txt`

Layer 3: MCP integration pattern for Claude Code

This is where LucidShark plugs in directly. Rather than waiting until commit time, you can surface SCA results inside the Claude Code session before the dependency is even written to disk.

`# lucidshark.config.yaml
checks:
  sca:
    enabled: true
    ecosystems:
      - npm
      - pypi
      - go
      - cargo
    block_on:
      - critical
      - high
    warn_on:
      - moderate
    datasources:
      - osv
      - snyk
      - github-advisories
  dependency_validation:
    check_registry_existence: true
    flag_new_packages: true
    flag_unlisted_scopes: true`

LucidShark's Role in the Supply Chain Gate

LucidShark runs SCA as one of its built-in checks alongside complexity analysis, test coverage, coupling metrics, and duplication detection. The design decision to run everything locally matters here specifically because of supply chain risk: when you run a cloud-based scanner, your dependency manifest (a complete map of your software's attack surface) leaves your machine. Local execution means zero data exfiltration risk from the tool itself.

The MCP integration means Claude Code can surface SCA findings inline. If you ask Claude Code to add a dependency and that package has known vulnerabilities, the MCP tool call returns the finding before the package.json is modified. The gate is in the workflow, not bolted on after the fact.

The checks run in under 60 seconds on a typical project. For teams shipping AI-generated code daily, that is the difference between catching a malicious import before it reaches production and reading about your incident in a post-mortem.

Note: LucidShark's SCA check queries OSV.dev and cross-references GitHub Advisory Database entirely locally. No dependency names, manifest contents, or source code leave your machine during the scan.

The 10,000 malicious repos story is not an isolated event. It is evidence of a maturing attacker playbook that has specifically adapted to the AI coding era. Attackers know developers using AI tools accept suggestions faster than they audit them. They are building infrastructure at scale to exploit exactly that behavior. The defense is not slower coding. It is inserting automated, deterministic checks that run faster than the attack.

Try LucidShark: Install via npm (npm install -g lucidshark), run lucidshark analyze in your repo, and get SCA results alongside complexity, coverage, and coupling metrics in under 60 seconds. Runs entirely local, no data leaves your machine, integrates with Claude Code via MCP. lucidshark.com

DEV Community