Simon Jenni Senior Research Scientist at Adobe Research

I build vision-language models and multimodal representations that power search, retrieval, and content understanding across Adobe's creative products. My work also spans visual fingerprinting and content authenticity. Previously, I completed my PhD at the University of Bern, supervised by Paolo Favaro.

Simon Jenni

Research

Vision-Language Models

Developing multimodal embedding models for visual search, retrieval, aesthetic prediction, and content recommendation. Applications include spatial understanding for intelligent reframing and content repurposing workflows.

Embeddings · VLMs · Retrieval · Reframing

Visual Fingerprinting & Matching

Robust image and video fingerprinting applied to content provenance, training data deduplication, asset management, and platform moderation at scale.

Fingerprinting · Deduplication · Provenance

Self-Supervised Learning

Training strategies for learning visual and multimodal representations without human annotation, from images, video, and audio.

Contrastive · Video · Audio

Product Impact

Research shipped into products
Premiere Pro — Search Panel
AI-powered media intelligence for searching and navigating video clips using natural language and visual similarity.
Vision-language embeddings
Lightroom — Semantic Search
Find photos using natural language descriptions, going beyond keywords and metadata to understand visual content.
Vision-language embeddings
Lightroom — Auto Stack & Culling
Automatically group visually similar photos and identify the best shots using visual similarity clustering and aesthetic quality prediction.
Visual similarity · Aesthetic prediction
Firefly — Reframe Video
Intelligently reframe video content for different aspect ratios in Adobe Firefly's creative production tools.
Conditional embeddings
Durable Content Credentials
Image fingerprinting for verifying content provenance, combining secure metadata, watermarking, and perceptual hashing.
Visual fingerprinting
AEM Assets — Smart Tags
AI-powered automatic tagging of images in Adobe Experience Manager, enabling large-scale asset discovery and organization.
Vision-language embeddings

Selected Publications

Full list on Google Scholar
2025 – 2026
Seeing Through Words: Controlling Visual Retrieval Quality with Language Models
J. Lu, S. Jenni, K. Kafle, J. Shi, H. Zhao, Y. Fu
ICLR 2026
The Photographer's Eye: Teaching Multimodal LLMs to See and Critique Like Photographers
D. Qi, H. Zhao, J. Shi, S. Jenni, Y. Fan, F. Dernoncourt, S. Cohen, S. Li
CVPR 2025
ViDROP: Video Dense Representation through Spatio-Temporal Sparsity
S. Sameni, S. Jenni, P. Favaro
CVPR 2025
The Indra Representation Hypothesis
J. Lu, H. Wang, K. Yang, Y. Zhang, S. Jenni, Y. Fu
NeurIPS 2025
Improving Large Vision and Language Models by Learning from a Panel of Peers
J. Hernandez, J. Shi, S. Jenni, V. Ordonez, K. Kafle
ICCV 2025
Magnet: Augmenting Generative Decoders with Representation Learning and Infilling
S. Khosla, A. Tiwari, K. Kafle, S. Jenni, H. Zhao, J. Collomosse, J. Shi
ACL 2025
2024
Building Vision-Language Models on Solid Foundations with Masked Distillation
S. Sameni, K. Kafle, H. Tan, S. Jenni
CVPR 2024
21 cited
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
I.R. Dave, F.C. Heilbron, M. Shah, S. Jenni
ECCV 2024 Oral
5 cited
FINEMATCH: Aspect-Based Fine-Grained Image and Text Mismatch Detection
H. Hua, J. Shi, K. Kafle, S. Jenni, D. Zhang, J. Collomosse, S. Cohen, J. Luo
ECCV 2024
28 cited
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
G. Kwon, S. Jenni, D. Li, J.Y. Lee, J.C. Ye, F.C. Heilbron
CVPR 2024
31 cited
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
I.R. Dave, S. Jenni, M. Shah
AAAI 2024
18 cited
2021 – 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
S. Jenni, A. Black, J. Collomosse
AAAI 2023
32 cited
Meta-Personalizing Vision-Language Models to Find Named Instances in Video
C.H. Yeh, B. Russell, J. Sivic, F.C. Heilbron, S. Jenni
CVPR 2023
22 cited
Time-Equivariant Contrastive Video Representation Learning
S. Jenni, H. Jin
ICCV 2021 Oral
76 cited
2018 – 2020
Video Representation Learning by Recognizing Temporal Transformations
S. Jenni, G. Meishvili, P. Favaro
ECCV 2020
177 cited
Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics
S. Jenni, H. Jin, P. Favaro
CVPR 2020 Oral
60 cited
Deep Bilevel Learning
S. Jenni, P. Favaro
ECCV 2018
168 cited
Self-Supervised Feature Learning by Learning to Spot Artifacts
S. Jenni, P. Favaro
CVPR 2018 Spotlight
160 cited

News

Jan 2026
Paper accepted to ICLR 2026
2025
Papers at CVPR (×2), ICCV, NeurIPS, and ACL
Oct 2024
Video fingerprinting technology featured in Project KnowHow Sneak at Adobe MAX 2024
Jan 2025
Promoted to Senior Research Scientist at Adobe Research
2024
Papers at CVPR (×2), ECCV (×2, incl. oral), and AAAI
2023
Papers at CVPR, ICCV, and AAAI
2022
Faculty Prize for best PhD dissertation, University of Bern
Dec 2021
Best Paper Award at CVMP 2021
May 2021
Joined Adobe Research
1,100+
Citations
29
Publications
6
Granted Patents
16
h-index

Awards

2022

Best PhD Dissertation

Faculty Prize, University of Bern

2021

Best Paper Award

CVMP 2021

2019 · 2020 · 2022

Outstanding Reviewer

CVPR 2019 (top 1%), ECCV 2020, ECCV 2022

2018

Best Poster Award

PRAIRIE/MIAI AI Summer School

2017

Best Master Thesis

Joint Alumni Association in Computer Science