Research · Metric AI Lab

Direction 01

Primary Focus Physical AI

Where AI meets the real world.

We believe the most interesting unsolved problems in AI are about understanding and acting in physical environments. Our research sits at the intersection of two hard problems.

Thread A

Real-World Intelligence

Training VLMs to reason about spatial structure, scene geometry, and physical constraints — moving beyond "what is this" to "where is this, how does it move, what will happen next."

Spatial intelligence and 3D scene understanding in VLMs
Embodied reasoning — what can and can't happen in a physical scene
Depth, affordance, and object permanence in vision-language models

Thread B

Robotic Control & Planning

Joint Embedding Predictive Architectures (JEPA) for long-horizon robotic planning. Teaching models to predict useful representations of future states, not just next tokens.

JEPA-based planners for multi-step manipulation tasks
Vision-Language-Action models for dexterous control
Long-horizon prediction without step-by-step supervision

JEPA for Long-Horizon Robotic Planning

Submitted to NeurIPS 2026. JEPA-based architectures trained for extended planning over physical action sequences in robotics tasks.

NeurIPS 2026 · Under Review

Direction 02

Open Science · Pro Bono Armenian AI & Low-Resource Languages

Armenian AI — and a blueprint for every underrepresented language.

We are Armenians. We invest our own budget and time into building AI infrastructure for Armenian — not because it's a business, but because it matters. The methods we develop translate directly to Georgian, Uzbek, and any other low-resource language.

🇦🇲 Released

ATE-1 & ATE-2

Armenian Text Embeddings — SOTA embedding models for Armenian. Outperform Gemini and OpenAI embeddings on every Armenian benchmark.

#1 on ArmBench-TextEmbed

Open

ArmBench Leaderboards

ArmBench-TextEmbed and ArmBench-LLM — the first open benchmarks for evaluating text embedding and LLM performance in Armenian.

First Armenian benchmarks

In progress

Armenian LLM

Training an LLM with a localized tokenizer from scratch. Existing models are tokenizer-inefficient for Armenian — we're fixing that with our own data pipeline.

Active training

Published · EACL 2026 · LoResLM Workshop

Adapting Text Embeddings to Low-Resource Languages with Noisy Translations

SOTA embedding quality in a low-resource language using only 10k noisy translations — no large parallel corpora required. Applied to Armenian; validated on Georgian and Uzbek. Method generalises to any low-resource language.

EACL 2026 · Published

Direction 03

Past Research Document AI

We pioneered visual document retrieval. Then moved on.

We built ColPali-style multimodal embeddings for visual document retrieval and held the #1 rank on the ViDoRe benchmark globally. The field matured and became crowded — we turned our attention to harder problems.

ViDoRe Benchmark · Globally #1 (held)

ColQwen Multimodal Document Embedding Series

Visual document retrieval embeddings ranked #1 on ViDoRe benchmark. Models still available and used in production. Research direction discontinued.

View on Hugging Face →

Everything is open.

All our models and benchmarks are public. We believe open science makes everyone better.

Metric-AI on Hugging Face Collaborate with us →