Accepted Papers
17 Papers Accepted, Congrats!
- #4 Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities
- #5 Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher
- #7 CODA-Mask: Contrastive and Adaptive Mask Aware Open-set Semantic Segmentation
- #9 Beyond Vision: Holistic World Models
- #13 Multimodal ELBO with Diffusion Decoders
- #16 Hierarchy Matters: Learning Vision–Language Representations in Hyperbolic Space
- #17 GROVE: Geometry-Aware Optimization for Robust Vision-Language Models
- #18 Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry
- #19 On The Application of Linear Attention in Multimodal Transformers
- #20 VIRTUE: Versatile Video Retrieval Through Unified Embeddings
- #21 AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model
- #22 CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
- #24 UniCanvas: A Diffusion-based Unified Model with Text-in-Image Joint Generation
- #31 Omni-Modal Dissonance Benchmark: Systematically Breaking Modality Consensus to Probe Robustness and Calibrated Abstention
- #33 TriTS: Time Series Forecasting from a Multimodal Perspective
- #34 MetaLex: Probing Literal-to-Figurative Grounding in Vision-Language Models
- #38 Do Prefix Attractors Cause Hallucination in LVLMs?