saboor

Posted on Jun 18

Your Thesis is Being Used to Train Its Own Replacement: The Ethics of Cloud-Based AI Detection

#ai #plagiarism

Team

Abdul Saboor Hamedi
Rosyaida Saara Hellena
Akhmad ghozal
Novar Kurniawan

This is the repo you can find all the source code.

Your Thesis is Being Used to Train Its Own Replacement: The Ethics of Cloud-Based AI Detection

The Hidden Cost of Good Writing

There is a growing fear in schools and universities today. Students who spend hours writing perfect, professional papers are being wrongly accused of cheating.

Because online AI detectors are too simple, they cannot tell the difference between a highly disciplined human writer and a machine. To a basic detector, clean and correct English looks exactly like a robot wrote it.

To fix this, we built the Neural Lab—a private, offline software tool that lets you check your own writing on your own computer without losing your privacy.

Free Online Detectors Steal Your Work

Most popular online AI detectors seem free, but you actually pay with your data.

The Risk: When you upload your essay or thesis to a cloud website, you give away your ownership. The company uses your original ideas to train and improve their future AI systems.
The Solution: The Neural Lab runs completely on your own computer hardware. It keeps your data safe and uses a fast local system to scan a massive document in under 2 seconds.

How Neural Lab Works Under the Hood

To keep your data 100% private, the entire system is built to run locally on your own machine, completely bypassing the cloud. The following figure shows how the project handles your text using a fast, secure setup:

1. Ingestion (The Gateway)

The system workflow begins with how it safely receives your documents.

Frontend Editor (A): This is the simple screen where you paste your raw text or upload your PDF files to be checked.
FastAPI Gateway (B): This acts like a private front door on your computer. It safely takes your document without letting any outside websites spy on it, keeping your files completely private.

2. Pre-processing (The Smart Traffic Controller)

Once inside, the tool checks the document size and breaks it down.

Document Length Check (C): The system automatically counts the characters to choose the fastest way to handle your paper.
Fast Heuristic Router (D): If your document is massive (over 30,000 characters), it takes this shortcut path to keep your computer from slowing down or freezing.
Deep Neural Router (E): If your document is standard or short (under 30,000 characters), it takes this path for a deeper, highly accurate test.
Splitter (F): No matter the length, the tool cuts the text into separate sentences so it can scan your writing piece by piece.

3. Forensic Instruments (The Three Examiners)

Next, three different local tests scan your text fragments at the exact same time:

Neural Analysis (G): Uses an AI model (GPT-2) running directly on your computer to measure how predictable and robotic your word patterns look.
Lexical Analysis (H): Checks your personal writing habits by looking at your sentence rhythm and how many unique words you choose.
Integrity Audit (I): Uses a super-fast local scanner (pg_trgm) to check your text against your own private saved drafts to see if you are reusing your older work.

Global Verdict: 81% Neural

What Your Score Means: The engine runs a deep mathematical calculation across every line to give you an overall result. For example, a document scoring an 81% Neural Signature shows very consistent machine-like text patterns, meaning the system is highly confident the writing mimics an AI model.

4. Score Fusion (The Math Combiner)

After gathering data from all three tests, the system blends the signals together so it doesn't make mistakes or rely on unfair guesses.

Forensic Calculator (J): This acts as the central brain, combining the neural scores, vocabulary patterns, and draft history.
Sigmoid Normalization: It runs these mixed numbers through a smooth mathematical curve. This guarantees that long, critical paragraphs carry more weight than tiny connector phrases, giving you an accurate final score for every single sentence.

5. Persistence & Presentation (The Safe and Display)

The final step securely logs your data and builds your visual dashboard.

PostgreSQL Database (L): Your scan histories, text pieces, and final scores are safely saved right on your own hard drive under your complete control.
3-Tier Assembly Engine (M): Uses a clean system build to instantly package your saved metrics for your screen.
Visual UI (N): Brings up a clear, color-coded editor where you can view your highlighted sentences alongside automated metric graphs.

Why Being "Too Perfect" Makes You Look Like a Bot

Basic detectors use a single-metric approach, meaning they only look at one thing: word patterns.

AI is trained on highly polished, formal English.
If a human writes a flawless paper, a basic detector flags it as AI simply because it lacks mistakes.
The Neural Lab solves this by testing multiple patterns at once, including how tightly packed the information is.

Real Human Writing Has a "Heartbeat"

Human writers naturally change their rhythm. We talk and write in irregular bursts:

We constantly mix short, punchy statements with long, complicated sentences.
This creates a bumpy, natural reading rhythm.
AI models are programmed to be perfectly smooth, so their sentences all share the exact same length and complexity, making them sound flat and robotic.

Territory Coverage Breakdown

Mapping Your Document: The system splits your physical text into three clear zones based on probability thresholds, showing you exactly how much of your document real estate looks human versus machine.

Exact AI Match: High AI signals (probability over 60%).
Minor AI Changes: Mixed text that looks like a hybrid blend (probability 25% to 60%).
Human Written: Clean text showing authentic human signatures (probability under 25%).

The Element of Surprise

AI text is highly predictable because the machine always selects the most common, statistically likely word to come next.

The Neural Lab measures how surprised its local baseline engine is by your text.
Because human minds are creative, we choose unexpected words and unique phrasing that break the machine’s mathematical expectations.

Your Typos Prove You Are Human

In a digital world where perfection belongs to machines, human mistakes have become a superpower.

The Neural Lab scans your text for natural human typing mistakes, like swapped letters or missed apostrophes.
While an AI can be told to fake a mistake, it cannot copy the chaotic, organic pattern of a real human slip of the finger.
If the tool finds a genuine human typo, it instantly drops the AI suspicion score.

The "Furthermore" Trap

AI models tend to constantly recycle a small list of transition words like furthermore, moreover, consequently, and essentially.

If your paper repeats these specific words too often, your vocabulary score drops.
Humans naturally use a much wider variety of words when explaining complex ideas.
The Neural Lab uses a dashboard vocabulary gauge to track this; using a rich, diverse set of words provides a strong indicator that a human wrote the text.

Model Classification Matrix

Behind the Accuracy Numbers: To protect authentic academic writers from false plagiarism accusations, the local engine monitors its own system performance constants to keep its precision levels incredibly high.

Precision Control: 95.20% accuracy in separating human text from AI.
Confidence Rating: 94.90% reliability score based on signal data.
Recall Sensitivity: 96.10% success rate in catching actual AI patterns.

Protecting Your Voice

AI detectors must stop using single, blind guesses to accuse students. The Neural Lab fixes this by making sure your main paragraphs hold more weight than small connecting sentences.

As AI continues to make all writing look identical, your unique rhythm, diverse vocabulary, and even your mistakes are your best defense to prove your humanity.

DEV Community

Your Thesis is Being Used to Train Its Own Replacement: The Ethics of Cloud-Based AI Detection

Your Thesis is Being Used to Train Its Own Replacement: The Ethics of Cloud-Based AI Detection

The Hidden Cost of Good Writing

Free Online Detectors Steal Your Work

How Neural Lab Works Under the Hood

1. Ingestion (The Gateway)

2. Pre-processing (The Smart Traffic Controller)

3. Forensic Instruments (The Three Examiners)

Global Verdict: 81% Neural

4. Score Fusion (The Math Combiner)

5. Persistence & Presentation (The Safe and Display)

Why Being "Too Perfect" Makes You Look Like a Bot

Real Human Writing Has a "Heartbeat"

Territory Coverage Breakdown

The Element of Surprise

Your Typos Prove You Are Human

The "Furthermore" Trap

Model Classification Matrix

Protecting Your Voice

Top comments (0)