Abstract:Anomaly detection usually assumes that abnormality is an intrinsic property of an observation. A defect is a defect, and a rare object is rare, regardless of where it appears. Many real-world anomalies do not work this way. A runner on a track is normal, but the same runner on a highway is not. The subject is unchanged; only the context makes it anomalous. This setting, long recognized as contextual anomaly detection, remains largely underexplored in modern vision-language systems. The difficulty is not merely empirical; it is formal. When anomaly labels depend on the relation between a subject and its context, any detector reasoning from a global representation that conflates subject and context is provably non-identifiable: two different subject-context configurations can map to the same embedding while requiring opposite labels, and no such detector can be correct on both. This impossibility motivates a different formulation: instead of asking whether an observation deviates from a global notion of normality, the model should ask whether subjects are compatible with their surrounding context. We define this as conditional compatibility learning. We instantiate this framework in CC-CLIP, a vision-language architecture that learns disentangled subject- and context-aware representations from a single image and fuses visual evidence through text-conditioned attention. CC-CLIP achieves state-of-the-art results on real-world contextual anomaly detection, substantially outperforming all existing CLIP-based and context-reasoning baselines. A single-branch variant of CC-CLIP also achieves competitive performance on structural anomaly benchmarks.
| Comments: | Preprint. 9 pages main text, plus appendix |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) |
| ACM classes: | I.2.6; I.2.10 |
| Cite as: | arXiv:2601.22868 [cs.CV] |
| (or arXiv:2601.22868v3 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2601.22868 arXiv-issued DOI via DataCite |
Submission history
From: Shashank Mishra [view email]
[v1]
Fri, 30 Jan 2026 11:48:20 UTC (24,346 KB)
[v2]
Sat, 28 Feb 2026 18:09:03 UTC (28,454 KB)
[v3]
Wed, 13 May 2026 14:33:56 UTC (20,631 KB)
