Genuine question because i cant find reliable benchmarks anywhere. everyone claims 95%+ accuracy on their marketing pages but whenever i actually test these tools with known images the results are way more inconsistent
tested a batch of 30 images last week. 15 real photos from my camera (sony a7iv, raw files converted to jpeg), 15 generated with midjourney v6 and flux pro
results: the best performing tool correctly identified 12/15 real and 11/15 ai. thats like 77% accuracy. and it flagged 3 of my own photos as ai generated including a street photography shot from tokyo that i literally have the raw file for
i get that this is hard technically but the gap between marketed accuracy and real world performance is massive. are there any independent benchmarks out there?
Did something similar for a class project. Tools were decent at catching dalle but really struggled with flux. Also compressed images (screenshots, social media saves) were way harder for the tools. Jpeg compression seems to destroy whatever signals theyre looking for.
from a practical standpoint the accuracy question matters a lot for content platforms. if youre running a site and want to flag ai images, even 90% accuracy means 10% of your real contributors get wrongly flagged. at scale that’s hundreds or thousands of angry photographers
i think the industry needs to move toward provenance-based verification rather than detection. detect after the fact is always going to be a losing game as generators improve
the marketing claims drive me crazy. “99.8% accuracy” tested on what dataset? generated by which models? at what resolution? none of them publish their methodology
i track this stuff for work and the honest answer is: no public, independent, regularly updated benchmark exists for image detection. the closest things are academic papers that are usually 6-12 months behind the latest generators
@SilentBean64 yeah the provenance approach makes more sense long term. if the camera/device signs the image at creation and that signature carries through the pipeline you dont need detection at all
problem is adoption. even if adobe and google implement it, it means nothing until every platform respects it