Hey everyone, welcome back to the blog. If you’ve been scrolling through Hacker News today, you probably saw a deeply unsettling headline: a police officer is currently under investigation for allegedly using generative AI to "create evidence" in multiple active cases. It’s the kind of story that makes your stomach drop, but as developers, engineers, and security folks, it should also make us perk our ears up. This isn't just a failure of human ethics; it's a catastrophic failure of data integrity, chain of custody, and systems architecture.
As builders, we are deploying LLMs, retrieval-augmented generation (RAG) pipelines, and automated data processors into production every single day. We are hooking these systems up to critical business databases, medical records, financial transactions, and legal workflows. Today's news is a stark, real-world warning: if you cannot cryptographically prove where a piece of data came from, who generated it, and whether it has been altered, your system is a liability.
In this post, we’re going to step away from the ethical shockwave and look at this through a pure engineering lens. How do we build systems that prevent data tampering? How do we establish absolute, mathematical "chain of custody" for digital assets using modern cryptography, digital signatures, and provenance frameworks? Let's dive in.
The Core Problem: The Illusion of "Untampered" Digital Assets
In the physical world, police use a physical chain of custody: a weapon is placed in a sealed bag, signed by an officer, logged into a secure room, and every hand that touches it is recorded on a paper trail. In the digital world, we’ve historically been incredibly lazy about this. We rely on metadata (like file creation timestamps or database updated_at columns) which can be trivially forged, or we rely on access control lists (ACLs) which can be bypassed by anyone with root database access.
When generative AI enters the mix, the threat vector changes. An insider with system access doesn't just edit a database record; they can generate entirely realistic, context-aware synthetic data (PDFs, images, logs, or text transcriptions) and insert them into a system. To prevent this, we must adopt a zero-trust architecture for data. We must assume that:
- The database administrator could be malicious.
- The API endpoints could be compromised.
- The storage buckets (like AWS S3) can have their contents replaced.
To solve this, we need to implement cryptographic provenance. Every piece of data must be signed at the point of origin, and every state transition must be verifiable.
Architecting Cryptographic Chain of Custody
Let's design a system architecture that guarantees data integrity from the moment a file or log is created. Whether you are building an evidence management system, a financial auditing ledger, or an enterprise document archive, the architecture remains the same.
Our architecture relies on three pillars:
- Asymmetric Cryptography (Digital Signatures): The creator (a device, a user, or a validated system process) signs the data using a private key. Anyone can verify it using the corresponding public key.
- Immutable Logs (Append-Only Ledgers): Once a state change is recorded, it can never be deleted or modified without breaking a cryptographic chain (similar to git commit hashes).
- Decentralized Identifiers (DIDs) / PKI: A robust Public Key Infrastructure that links public keys to verified identities.
The Provenance Workflow
In a secure architecture, the lifecycle of a document or data point looks like this:
[Data Creator]
│ Generates Payload + Computes SHA-256 Hash
▼
[Private Key Owner]
│ Signs Hash (ECDSA/RSA)
▼
[API Gateway]
│ Verifies Signature against Public Key Infrastructure (PKI)
▼
[Immutable Storage (S3 Object Lock / WORM)] & [Ledger/Blockchain]
Step-by-Step Implementation: Signing and Verifying Data in Node.js
Let's get practical. How do we implement this in code? We will write a lightweight Node.js implementation that mimics a secure data ingestion service. We will generate an asymmetric key pair, sign a payload (e.g., an officer's report or an automated system log), and verify that payload. If even a single character of the payload is altered—or if an LLM is used to regenerate the file—the verification will fail instantly.
Step 1: Key Generation
First, let's write a script to generate a secure elliptic-curve key pair (using the secp256k1 curve, or in this case, standard prime256v1 which is widely supported for ECDSA signatures).
const crypto = require('crypto');
const fs = require('fs');
function generateKeyPair() {
const { privateKey, publicKey } = crypto.generateKeyPairSync('ec', {
namedCurve: 'prime256v1', // NIST P-256 curve
publicKeyEncoding: {
type: 'spki',
format: 'pem'
},
privateKeyEncoding: {
type: 'pkcs8',
format: 'pem'
}
});
fs.writeFileSync('private_key.pem', privateKey);
fs.writeFileSync('public_key.pem', publicKey);
console.log("Keys successfully generated and saved!");
}
generateKeyPair();
Step 2: Signing the Evidence/Data Payload
Now, let's simulate the creation of a data record. This payload contains the actual data, a timestamp, an author ID, and a unique identifier. We will sign this payload using our private key.
const crypto = require('crypto');
const fs = require('fs');
// Load private key
const privateKey = fs.readFileSync('private_key.pem', 'utf8');
// The data payload (could be a police report, sensor data, or financial transaction)
const evidencePayload = {
caseId: "CASE-2023-8891A",
officerId: "BADGE-5521",
timestamp: "2023-10-27T10:30:00Z",
evidenceNotes: "Suspect confessed to the charge during voluntary interview.",
hashOfOriginalFile: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" // e.g., video file hash
};
// Convert payload to canonical JSON string to ensure deterministic serialization
const canonicalPayload = JSON.stringify(evidencePayload);
// Sign the payload
const signer = crypto.createSign('SHA256');
signer.update(canonicalPayload);
signer.end();
const signature = signer.sign(privateKey, 'base64');
console.log("--- SIGNED PAYLOAD ---");
console.log(JSON.stringify({
payload: evidencePayload,
signature: signature
}, null, 2));
Step 3: Verifying the Chain of Custody
When an auditor, a judge, or an external API consumer receives this data, they don't have to trust our database. They simply use our public key to verify that the payload hasn't been tampered with since the moment of signing.
const crypto = require('crypto');
const fs = require('fs');
// Load public key
const publicKey = fs.readFileSync('public_key.pem', 'utf8');
// Received data packet (simulated)
const receivedData = {
payload: {
caseId: "CASE-2023-8891A",
officerId: "BADGE-5521",
timestamp: "2023-10-27T10:30:00Z",
evidenceNotes: "Suspect confessed to the charge during voluntary interview.", // Change even one character to fail!
hashOfOriginalFile: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
},
signature: "MEUCIQD3t..." // The base64 signature generated in Step 2
};
function verifyData(dataPacket, pubKey) {
const canonicalPayload = JSON.stringify(dataPacket.payload);
const verifier = crypto.createVerify('SHA256');
verifier.update(canonicalPayload);
verifier.end();
const isValid = verifier.verify(pubKey, dataPacket.signature, 'base64');
return isValid;
}
const isAuthentic = verifyData(receivedData, publicKey);
console.log(`Is the data authentic and untampered? ${isAuthentic}`);
Taking It to Production: Standards and Tools
While the Node.js example demonstrates the core cryptographic mechanics, building a production-grade secure system requires established standards. You don't want to roll your own cryptographic protocols if you can avoid it.
1. W3C Verifiable Credentials (VCs)
If you are building systems that require robust identity and claim verification, look at the W3C Verifiable Credentials Data Model. VCs use cryptography to make digital credentials tamper-evident and cryptographically verifiable. This is perfect for verifying who generated a piece of data (e.g., a certified sensor, a verified user, or an authorized AI service agent).
2. C2PA (Coalition for Content Provenance and Authenticity)
To combat AI-generated deepfakes and forged media, major tech companies (including Adobe, Microsoft, and Intel) created the C2PA standard. C2PA allows content creators to embed asset metadata directly into media files (images, audio, video) using cryptographic signatures. If an image is edited, or if it was generated by an AI model (like DALL-E or Midjourney), the C2PA manifest securely tracks these assertions. Implementing C2PA SDKs in your media-handling microservices is becoming a modern security requirement.
3. Write-Once-Read-Many (WORM) Storage
Even with digital signatures, an attacker with admin access could delete the entire database record or signature file. To prevent this, critical log data and signatures should be written to WORM storage. AWS S3 offers "S3 Object Lock" in compliance mode, which prevents objects from being deleted or overwritten by anyone—including the AWS root account—for a defined retention period.
Conclusion & Call to Action
The headline about police AI-generated evidence is a wake-up call. We are rapidly entering an era where seeing is no longer believing. Text, audio, images, and video can all be fabricated in seconds with staggering accuracy. As software engineers, we can no longer afford to build systems that treat data integrity as an afterthought or a "nice-to-have" feature on a security checklist.
We must design our systems with the assumption that the data layer *will* be targeted. Cryptographic signing at the edge, robust PKI, and immutable logging are the only ways to defend the truth of our digital records.
What about you? How does your team handle data integrity and write-protection for sensitive records? Have you started implementing C2PA or cryptographic logging in your apps? Let’s talk about it in the comments below!
Until next time, keep your systems secure and your keys private. — Alex