Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ADu2021/cooper-spatial-intelligence

Name: cooper-spatial-intelligence
Author: ADu2021

skills/skillxiv-v0.0.2-claude-opus-4.6/cooper-spatial-intelligence/SKILL.md

npx skillsauth add ADu2021/skillXiv cooper-spatial-intelligence

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Overview

COOPER unifies cooperative perception and reasoning through a two-stage training approach that develops both auxiliary modality generation and adaptive reasoning capabilities. Rather than treating perception and reasoning separately, the model learns to generate depth and segmentation maps while developing interleaved reasoning strategies.

When to Use

Multimodal tasks requiring strong 3D spatial understanding
Applications needing distance and size estimation from images
Vision-language models that struggle with spatial relationships
Scenarios requiring reasoning over spatial properties (volume, distance, orientation)
Tasks involving scene understanding with geometric constraints

When NOT to Use

2D image analysis where depth adds no value
Tasks not requiring spatial reasoning
Models already achieving satisfactory spatial understanding
Real-time applications where auxiliary modality generation adds latency
Scenarios with limited 3D training data

Core Technique

Two-stage training developing auxiliary modality generation and adaptive reasoning:

# Unified spatial reasoning architecture
class CooperativeSpatialModel:
    def __init__(self, vllm_backbone):
        self.vllm = vllm_backbone

        # Auxiliary modality generators
        self.depth_generator = DepthDecoder()
        self.segmentation_generator = SegmentationDecoder()

        # Adaptive reasoning module
        self.reasoning_adapter = ReasoningAdapter()

    def forward(self, image, question):
        """
        Unified perception and reasoning for spatial intelligence.
        Generates auxiliary modalities and performs adaptive reasoning.
        """
        # Extract visual features
        features = self.vllm.encode_image(image)

        # Generate auxiliary modalities
        depth_map = self.depth_generator(features)
        segmentation = self.segmentation_generator(features)

        # Integrate auxiliary modalities with text
        enhanced_features = self.integrate_modalities(
            features, depth_map, segmentation
        )

        # Adaptive interleaved reasoning
        reasoning_path = self.reasoning_adapter.compute_path(
            enhanced_features, question
        )

        # Generate answer with spatial reasoning
        answer = self.vllm.decode_with_path(
            enhanced_features,
            question,
            reasoning_path
        )

        return answer, depth_map, segmentation

    def integrate_modalities(self, visual, depth, segmentation):
        """
        Combine visual understanding with spatial auxiliary modalities.
        Learning to generate these modalities helps internalize spatial knowledge.
        """
        # Depth provides scale and distance information
        depth_features = self.process_depth(depth)
        # Segmentation provides object boundaries and relationships
        seg_features = self.process_segmentation(segmentation)

        # Multi-stream fusion
        combined = torch.cat([visual, depth_features, seg_features], dim=-1)
        return combined

    def reasoning_adapter(self, features, question):
        """
        Adaptive interleaved reasoning strategies.
        Routes reasoning based on spatial complexity.
        """
        complexity_score = self.estimate_spatial_complexity(question)
        if complexity_score > 0.7:
            # Multi-step reasoning for complex spatial questions
            return self.multi_step_reasoning(features, question)
        else:
            # Direct reasoning for simple questions
            return self.direct_reasoning(features, question)

Two-stage training: first learn auxiliary modality generation, then jointly optimize with adaptive reasoning.

Key Results

6.91% improvement in spatial reasoning tasks
7.92% improvement on distance and size estimation
General performance preservation across other tasks
Effective integration of depth and segmentation signals

Implementation Notes

Auxiliary modalities (depth, segmentation) aid spatial internalization
Adaptive reasoning interleaves steps based on question complexity
Two-stage training balances modality generation with reasoning
Preserves compatibility with underlying VLLM architecture

References

Original paper: https://arxiv.org/abs/2512.04563
Focus: Spatial reasoning in multimodal models
Domain: Vision-language models, 3D understanding

ADu2021/cooper-spatial-intelligence

skills/skillxiv-v0.0.2-claude-opus-4.6/cooper-spatial-intelligence/SKILL.md

Enhance spatial reasoning in multimodal LLMs by integrating depth and segmentation as auxiliary modalities with adaptive reasoning strategies. COOPER achieves 6.91% improvement in spatial understanding—when you need 3D-aware vision-language capabilities.

2 stars

data-ai

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add ADu2021/skillXiv cooper-spatial-intelligence

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 5:31 AM24.4s1 file scanned

SKILL.md

name:: cooper-spatial-intelligence
title:: COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence
version:: 0.0.2
engine:: skillxiv-v0.0.2-claude-opus-4.6
license:: MIT
url:: https://arxiv.org/abs/2512.04563
keywords:: [spatial reasoning, 3D understanding, auxiliary modalities, multimodal LLMs, depth and segmentation]
description:: Enhance spatial reasoning in multimodal LLMs by integrating depth and segmentation as auxiliary modalities with adaptive reasoning strategies. COOPER achieves 6.91% improvement in spatial understanding—when you need 3D-aware vision-language capabilities.

Overview

When to Use

Multimodal tasks requiring strong 3D spatial understanding
Applications needing distance and size estimation from images
Vision-language models that struggle with spatial relationships
Scenarios requiring reasoning over spatial properties (volume, distance, orientation)
Tasks involving scene understanding with geometric constraints

When NOT to Use

2D image analysis where depth adds no value
Tasks not requiring spatial reasoning
Models already achieving satisfactory spatial understanding
Real-time applications where auxiliary modality generation adds latency
Scenarios with limited 3D training data

Core Technique

Two-stage training developing auxiliary modality generation and adaptive reasoning:

# Unified spatial reasoning architecture
class CooperativeSpatialModel:
    def __init__(self, vllm_backbone):
        self.vllm = vllm_backbone

        # Auxiliary modality generators
        self.depth_generator = DepthDecoder()
        self.segmentation_generator = SegmentationDecoder()

        # Adaptive reasoning module
        self.reasoning_adapter = ReasoningAdapter()

    def forward(self, image, question):
        """
        Unified perception and reasoning for spatial intelligence.
        Generates auxiliary modalities and performs adaptive reasoning.
        """
        # Extract visual features
        features = self.vllm.encode_image(image)

        # Generate auxiliary modalities
        depth_map = self.depth_generator(features)
        segmentation = self.segmentation_generator(features)

        # Integrate auxiliary modalities with text
        enhanced_features = self.integrate_modalities(
            features, depth_map, segmentation
        )

        # Adaptive interleaved reasoning
        reasoning_path = self.reasoning_adapter.compute_path(
            enhanced_features, question
        )

        # Generate answer with spatial reasoning
        answer = self.vllm.decode_with_path(
            enhanced_features,
            question,
            reasoning_path
        )

        return answer, depth_map, segmentation

    def integrate_modalities(self, visual, depth, segmentation):
        """
        Combine visual understanding with spatial auxiliary modalities.
        Learning to generate these modalities helps internalize spatial knowledge.
        """
        # Depth provides scale and distance information
        depth_features = self.process_depth(depth)
        # Segmentation provides object boundaries and relationships
        seg_features = self.process_segmentation(segmentation)

        # Multi-stream fusion
        combined = torch.cat([visual, depth_features, seg_features], dim=-1)
        return combined

    def reasoning_adapter(self, features, question):
        """
        Adaptive interleaved reasoning strategies.
        Routes reasoning based on spatial complexity.
        """
        complexity_score = self.estimate_spatial_complexity(question)
        if complexity_score > 0.7:
            # Multi-step reasoning for complex spatial questions
            return self.multi_step_reasoning(features, question)
        else:
            # Direct reasoning for simple questions
            return self.direct_reasoning(features, question)

Two-stage training: first learn auxiliary modality generation, then jointly optimize with adaptive reasoning.

Key Results

6.91% improvement in spatial reasoning tasks
7.92% improvement on distance and size estimation
General performance preservation across other tasks
Effective integration of depth and segmentation signals

Implementation Notes

Auxiliary modalities (depth, segmentation) aid spatial internalization
Adaptive reasoning interleaves steps based on question complexity
Two-stage training balances modality generation with reasoning
Preserves compatibility with underlying VLLM architecture

References

Original paper: https://arxiv.org/abs/2512.04563
Focus: Spatial reasoning in multimodal models
Domain: Vision-language models, 3D understanding

Related Skills

ADu2021/flow-map-trajectory-tilting

testing

VerifiedTrustedCommunity

Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

testing

VerifiedTrustedCommunity

Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

data-ai

VerifiedTrustedCommunity

Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution

devops

VerifiedTrustedCommunity

Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flex-continuous-agent-evolution

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ADu2021/skillXiv.git

# Copy into Claude Code skills folder (global)
cp -r skillXiv/skills/skillxiv-v0.0.2-claude-opus-4.6/cooper-spatial-intelligence ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ADu2021/skillXiv

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT