Pattern Recognition Letters · In Press

Take a Peek
Efficient Encoder Adaptation for Few-Shot Semantic Segmentation via LoRA

Pasquale De Marinis · Gennaro Vessio · Giovanna Castellano

Department of Computer Science, University of Bari Aldo Moro, Italy

Abstract

Adapting the Encoder, Not Just the Decoder

Few-shot semantic segmentation (FSS) aims to segment novel classes in query images using only a small annotated support set. While prior research has mainly focused on improving decoders, the encoder's limited ability to extract meaningful features for unseen classes remains a key bottleneck. In this work, we introduce Take a Peek (TaP), a simple yet effective method that enhances encoder adaptability for both FSS and cross-domain FSS by inducing a lightweight feature-space shift conditioned on the support set.

TaP leverages Low-Rank Adaptation (LoRA) to fine-tune the encoder on the support set with minimal computational overhead, enabling fast adaptation to novel classes while mitigating catastrophic forgetting. Our method is model-agnostic and can be seamlessly integrated into existing FSS pipelines. Extensive experiments across multiple benchmarks — including COCO 20ⁱ, Pascal 5ⁱ, and cross-domain datasets (DeepGlobe, ISIC, Chest X-ray) — demonstrate that TaP consistently improves segmentation performance across diverse models and shot settings.

Overview

How TaP Works

Method

The Frozen Encoder Problem — and How TaP Solves It

Most FSS models freeze the encoder, adapting only the decoder. This leaves a critical gap: a pretrained encoder cannot discriminate novel classes it has never seen, regardless of how good the decoder is.

TaP fixes this at inference time. LoRA adapters fine-tune the encoder on the support set via the substitution strategy — each support image briefly acts as a pseudo-query, supervised by the others. The decoder is never touched; the encoder simply arrives better prepared.

🔒

Decoder stays frozen

No decoder modification — TaP plugs into any existing FSS model without retraining.

⚡

LoRA keeps it efficient

Only $A$ and $B$ in $W' = W + \alpha AB$ are trained. The base weights never change.

🔄

Substitution provides supervision

Known support images act as pseudo-queries, giving a free training signal without any extra labelled data.

Adaptation Loop

The Substitution Strategy

Each of the N×K support images takes a turn as a pseudo-query. Its ground-truth mask supervises a forward–backward pass; only the LoRA adapters are updated, leaving the base encoder and decoder weights untouched.

Class A

Class B

Pseudo-query

Context support

Encoder LoRA 🔥

Decoder ❄️

Focal Loss

Select pseudo-query

Forward pass

Compute loss

Backprop → LoRA

Step 1 / 10 · Outer iteration 1 / T

Qualitative Analysis

Feature-Space Shift Across Iterations

As TaP adapts the encoder, pixel-level features from the query and support images progressively separate by class in the embedding space. The animation below shows the encoder output (last Swin-B scale, 1024 d, projected to 2D via t-SNE) and the corresponding segmentation prediction at each adaptation step.

Loading feature-shift data…

Visual Comparison

Before and After TaP Adaptation

2-way 3-shot episode on COCO 20ⁱ — DCAMA with Swin-B backbone.

Query Image

Vanilla (no TaP)

With TaP

Experimental Results

Consistent Gains Across Models & Benchmarks

All results are averaged over 5 runs × 1000 episodes. TaP is compared against the vanilla baseline (frozen encoder), Decoder FT, and AdaptiveFSS.

BAM · COCO 2-way

+0.00%

mIoU improvement, 5-shot

DCAMA · Pascal 2-way

+0.00%

mIoU improvement, 5-shot

DMTNet · Chest X-ray

+0.00%

mIoU improvement, 15-shot

Trainable params

0.41%

of total (r = 2³ for DCAMA)

COCO 20ⁱ — mean mIoU improvement over vanilla

Model	1-way 5-shot	2-way 5-shot
BAM	+7.14	+8.33
DCAMA	+1.74	+5.44
FPTrans	+0.66	+3.96
HDMNet	+1.66	+3.97
Label Anything	+3.32	+5.00

Cross-Domain FSS (DMTNet) — mean mIoU improvement

Dataset	3-shot	5-shot	10-shot	15-shot
DeepGlobe	+1.64	+2.42	+2.83	+4.55
ISIC	+3.26	+2.26	+4.01	+4.97
Chest X-ray	+13.76	+15.95	+18.28	+20.65

Citation

BibTeX

@article{demarinisTakePeekEfficient2026,
	title = {Take a peek: {Efficient} encoder adaptation for few-shot semantic segmentation via {LoRA}},
	volume = {207},
	issn = {0167-8655},
	shorttitle = {Take a peek},
	url = {https://www.sciencedirect.com/science/article/pii/S0167865526001996},
	doi = {10.1016/j.patrec.2026.06.003},
	journal = {Pattern Recognition Letters},
	author = {De Marinis, Pasquale and Vessio, Gennaro and Castellano, Giovanna},
	year = {2026},
	keywords = {Semantic segmentation, Few-shot learning, LoRA, Deep neural networks, Domain shift},
	pages = {47--54},
}

Take a Peek Efficient Encoder Adaptation for Few-Shot Semantic Segmentation via LoRA

Adapting the Encoder, Not Just the Decoder

How TaP Works

The Frozen Encoder Problem — and How TaP Solves It

Decoder stays frozen

LoRA keeps it efficient

Substitution provides supervision

The Substitution Strategy

Feature-Space Shift Across Iterations

Encoder Feature-Space Evolution

Before and After TaP Adaptation

Consistent Gains Across Models & Benchmarks

COCO 20i — mean mIoU improvement over vanilla

Cross-Domain FSS (DMTNet) — mean mIoU improvement

BibTeX

Take a Peek
Efficient Encoder Adaptation for Few-Shot Semantic Segmentation via LoRA

COCO 20ⁱ — mean mIoU improvement over vanilla