
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
www.marktechpost.com
Artificial Neural Networks (ANNs) have revolutionized computer vision with great performance, but their black-box nature creates significant challenges in domains requiring transparency, accountability, and regulatory compliance. The opacity of these systems hampers their adoption in critical applications where understanding decision-making processes is essential. Scientists are curious to understand these models internal mechanisms and want to utilize these insights for effective debugging, model improvement, and exploring potential parallels with neuroscience. These factors have catalyzed the rapid development of explainable artificial intelligence (XAI) as a dedicated field. It focuses on the interpretability of ANNs, bridging the gap between machine intelligence and human understanding.Concept-based methods are powerful frameworks among XAI approaches for revealing intelligible visual concepts within ANNs complex activation patterns. Recent research characterizes concept extraction as dictionary learning problems, where activations map to a higher-dimensional, sparse concept space that is more interpretable. Techniques like Non-negative Matrix Factorization (NMF) and K-Means are used to accurately reconstruct original activations, while Sparse Autoencoders (SAEs) have recently gained prominence as powerful alternatives. SAEs achieve an impressive balance between sparsity and reconstruction quality but suffer from instability. Training identical SAEs on the same data can produce different concept dictionaries, limiting their reliability and interpretability for meaningful analysis.Researchers from Harvard University, York University, CNRS, and Google DeepMind have proposed two novel variants of Sparse Autoencoders to address the instability issues: Archetypal-SAE (A-SAE) and its relaxed counterpart (RA-SAE). These approaches build upon archetypal analysis to enhance stability and consistency in concept extraction. The A-SAE model constrains each dictionary atom to reside strictly within the convex hull of the training data, which imposes a geometric constraint that improves stability across different training runs. The RA-SAE extends this framework further by incorporating a small relaxation term, allowing for slight deviations from the convex hull to enhance modeling flexibility while maintaining stability.The researchers evaluate their approach using five vision models: DINOv2, SigLip, ViT, ConvNeXt, and ResNet50, all obtained from the timm library. They construct overcomplete dictionaries with sizes five times the feature dimension (e.g., 7685 for DINOv2 and 20485 for ConvNeXt), providing sufficient capacity for concept representation. The models undergo training on the entire ImageNet dataset, processing approximately 1.28 million images that generate over 60 million tokens per epoch for ConvNeXt and more than 250 million tokens for DINOv2, continuing for 50 epochs. Moreover, RA-SAE builds upon a TopK SAE architecture to maintain consistent sparsity levels across experiments. The computation of a matrix involves K-Means clustering of the entire dataset into 32,000 centroids.The results demonstrate significant performance differences between traditional approaches and the proposed methods. Classical dictionary learning algorithms and standard SAEs show comparable performance but struggle to recover true generative factors in the tested datasets accurately. In contrast, RA-SAE achieves higher accuracy in recovering underlying object classes across all synthetic datasets used in the evaluation. In qualitative results, RA-SAE uncovers meaningful concepts, including shadow-based features linked to depth reasoning, context-dependent concepts like barber, and fine-grained edge detection capabilities in flower petals. Moreover, it learns more structured within-class distinctions than TopK-SAEs, separating features like rabbit ears, faces, and paws into distinct concepts rather than mixing them.In conclusion, researchers have introduced two variants of Sparse Autoencoders: A-SAE and its relaxed counterpart RA-SAE. A-SAE constrains dictionary atoms to the convex hull of the training data and enhances stability while preserving expressive power. Then, RA-SAE effectively balances reconstruction quality with meaningful concept discovery in large-scale vision models. To evaluate these approaches, the team developed novel metrics and benchmarks inspired by identifiability theory, providing a systematic framework for measuring dictionary quality and concept disentanglement. Beyond computer vision, A-SAE establishes a foundation for more reliable concept discovery across broader modalities, including LLMs and other structured data domains.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our80k+ ML SubReddit.The post Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models appeared first on MarkTechPost.
0 Commentaires
·0 Parts
·31 Vue