Metadata gets stripped everywhere so what even is the point of content provenance

kind of a nihilistic take but hear me out

we talk a lot about content provenance, watermarking, content credentials, metadata standards etc. all good ideas in theory. but the reality is:

  • every social media platform strips metadata on upload
  • messaging apps compress and reprocess media
  • screenshots contain zero provenance information
  • right-click save strips most embedded data
  • even email attachments can lose metadata depending on the client

so even if we had perfect provenance technology, the actual distribution channels people use would destroy it. and thats before bad actors intentionally strip it

feels like were building a lock for a door that doesnt have walls. am i wrong? someone convince me provenance is solvable

You’re not wrong about the current state but I think you’re discounting the trajectory. The technical solutions for surviving platform processing exist (perceptual hashing, steganographic embedding, content fingerprinting). The bottleneck is adoption, not technology.

Consider HTTPS adoption as an analogy. In 2013, less than 30% of web traffic was encrypted. Today it’s over 95%. It took years and pressure from both browsers and regulation. Content provenance could follow a similar path if the incentive structures align.

The screenshots point is the one that gets me. so much content is consumed and shared as screenshots now. tweet screenshots, article screenshots, message screenshots. theres literally no provenance chain for a screenshot

but also i think provenance doesn’t need to be perfect to be useful. even if 50% of content has verifiable provenance, that shifts the default from “assume everything is real” to “be suspicious of anything without credentials.” thats progress even if its incomplete

Technically the steganographic approach is the most promising. embed the credential information in the actual pixel or waveform data at a level that survives compression and reprocessing. its like an invisible watermark that carries provenance info

the tradeoff is that it slightly degrades quality and adds compute to both creation and verification. but for high-stakes content (news, evidence, official communications) that tradeoff seems worth it

@Marc_Delrieu the https analogy is actually encouraging. Wonder what the equivalent forcing function would be for content credentials