Jaydeep Shah (JD)

Posted on Jun 16

What I Learned About "On-Device AI": Most of It Is a Promise, Not a Guarantee

#edgeai #android #privacy

"On-device AI" is everywhere right now. Apple says it. Google says it. Samsung says it. But the term gets used to describe architectures that have almost nothing in common with each other. Some of them send your data to a server. Some of them literally cannot.

I spent the Qualcomm x Google LiteRT Developer Hackathon 2026 building Redacto - an on-device PII redaction app that runs Gemma 4 E2B entirely on a Snapdragon 8 Elite, no cloud involved. Through that process, I discovered that "on-device" is not a binary. It is a spectrum, and where your app falls on that spectrum has real consequences for privacy, compliance, and trust.

I found four levels, and the industry blurs all of them

Level	What Happens	Network Required	Your Data Leaves the Device
1. Cloud AI	Model runs on remote servers. Your input is sent over the internet, processed, and the result is returned.	Yes	Yes
2. On-device with telemetry	Model runs locally. But the app phones home - analytics, crash reports, model update checks, usage metrics.	Yes	Partially (metadata, not input data - in theory)
3. Hybrid / fallback	Model runs locally when possible. Complex queries fall back to a cloud model.	Yes	Sometimes (when fallback triggers)
4. Zero-trust on-device	Model runs locally. App has no network access at all. Cannot send data even if compromised.	No	No - structurally impossible

Most apps that market themselves as "on-device" fall into levels 2 or 3. Many flagship phone AI features still require network connectivity for certain operations. They run inference locally, but they still open network connections. That is not a criticism - telemetry and hybrid fallback are legitimate design choices. But they are different from zero-trust, and the difference matters when the data is sensitive.

The Android kernel enforces a hard line

On Android, there is a clean technical boundary between "we choose not to send data" and "the app cannot send data." It comes down to one declaration in AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET" />

If your app declares android.permission.INTERNET, the OS grants it the ability to open network sockets. If it does not, the Linux kernel blocks all socket calls from your process. This is not an app-level policy. It is enforced at the kernel level by Android's UID-based networking rules - each app runs as its own Linux user, and the kernel's inet group membership (AID_INET) controls whether that user can create sockets.

This is a structural guarantee, not a policy promise. A compromised app without INTERNET permission cannot exfiltrate data over the network because the operating system will not let it open a connection.

Why the distinction matters

HIPAA and health data. Under HIPAA, if your app processes Protected Health Information (PHI), you typically need a Business Associate Agreement (BAA) with any third party that might receive that data. If PHI never leaves the device - structurally, not by policy - there is no third party receiving it. The BAA question does not arise in the same way. "We promise not to send your medical records" and "the app literally cannot send your medical records" have very different standing.

Note: This discussion is for educational purposes. Consult a HIPAA compliance specialist for your specific situation.

Legal and classified contexts. For attorneys handling privileged communications, journalists protecting sources, or organizations working with classified material, "we promise not to send it" is insufficient. The question is whether the architecture makes exfiltration impossible, not just unintended.

User trust. Users increasingly understand that "AI features" often means "your data gets sent somewhere." A zero-trust architecture lets you make a simple, verifiable claim: the app has no network permission, and you can confirm that yourself in your device settings.

Edge computing is not on-device

One more distinction worth drawing: "edge computing" and "on-device" are not synonyms. Edge computing means processing happens closer to the user than a centralized cloud data center - at a cell tower, a local server, a regional node. The data still leaves the user's device. It just travels a shorter distance. On-device means processing happens on the user's actual hardware. The data does not leave.

When your phone sends a query to a server in a nearby data center, that is edge computing. When your phone runs an on-device model without any network call, that is on-device. Edge computing reduces latency. On-device eliminates data transmission. These are different properties solving different problems.

How Redacto implements zero-trust

We made a deliberate architectural decision: no INTERNET permission. The app runs a 4-step LLM pipeline - Classify, Detect, Redact, Validate - entirely on-device using LiteRT-LM and Gemma 4 E2B. All inference happens in-process. There is no analytics SDK, no crash reporting service, no model update mechanism that phones home.

The tradeoff is real: no cloud fallback for hard queries, no over-the-air model updates, no remote crash diagnostics. For a privacy-critical app handling medical records and legal documents, that tradeoff is worth it.

The one question that settles it

"On-device" is not one thing. It is a spectrum from "runs locally but phones home" to "structurally cannot access the network." When you are handling PHI or privileged communications, the distinction between policy and structure is the whole ballgame.

Ask one question: does the app declare android.permission.INTERNET? If yes, it can send data. If no, it cannot. The rest is architecture.

Related in this series of "Edge AI from the Trenches"

What I Learned by Dissecting gemma-4-E2B-it_qualcomm_sm8750.litertlm - what each segment of a model filename means, from family to file format
Why My LLM Runs 4x Faster on Hardware I Had Never Heard Of - the hardware that powers on-device inference
Why Single-Pass LLM Redaction Fails - the pipeline architecture Redacto uses for on-device processing

Jaydeep Shah is a developer with roots in embedded systems, Android platform internals, and silicon-level AI optimization. He now explores on-device AI inference - bringing models from the cloud to phones and edge hardware. Along with his team Edge Artists, he builds applications using LiteRT-LM and Gemma models on mobile hardware, and writes about what works, what breaks, and what he learns along the way. This post is part of the Edge AI from the Trenches series.

Sources:

Last updated: June 2026
8th of 22 posts in the "Edge AI from the Trenches" series

DEV Community