Proprietary edge-case datasets from the environments
your benchmark has never seen.

Three verticals. Task-engineered collection. IAA-verified annotation on every batch. Browse what's available or scope a custom collection for edge cases not in the library.

Autonomous VehiclesAvailable

India Road Edge Cases — v1.2

For AV teams whose models disengage at Indian intersections that no Western benchmark has ever included.


Scenario Coverage

Unmarked intersectionsMonsoon visibility degradationContraflowCattle crossingsTwo-wheeler swarmsSignal jumpsNight low-lightRegional-script signage

Taxonomy
138-class Indian road actor taxonomy (KITTI: 8 classes · nuScenes: 23 classes)
Annotation type
Temporal intent sequences per tracked agent + per-frame bounding labels
IAA score
Fleiss κ = 0.79 mean across scenario classes. Range: κ 0.71 (ambiguous intent) – 0.86 (vehicle class)
Collection
Task-engineered · on-ground · zero synthetic augmentation

COCO JSONHuggingFace Dataset CardCustom schema available
Physical RoboticsAvailable

Indian Home Manipulation — v0.9

For robotics teams whose manipulation policies fail in Indian residential environments after succeeding in the lab.


Environment Coverage

Gas stovesTava flipsPressure cookersSteel thalisDeformable fabrics (saris, dupattas)Kirana shelvesMortar and pestleCluttered domestic spaces

Environments
1,000+ distinct home and workspace settings. Urban · Tier 2 · Tier 3 Indian cities
Failure mode taxonomy
7 grasp failure subtypes · 4 slip event subtypes · Spatial affordance error · Occlusion misidentification
Annotation type
Action segmentation · Grasp keyframes · Object trajectory · 6-DOF pose · Contact surface labels
Object coverage
Steel thalis · Pressure cookers · Clay vessels · Jute textiles · Brass — all labeled, absent from public benchmarks

HDF5 (teleop-compatible)LeRobot formatDataset card included
Voice & DialectAvailable

India Dialect Corpus — v1.1

For ASR and NLU teams whose models break on Hinglish, Bhojpuri, and regional Indian speech that IndicSUPERB averaged away.


Language Coverage

HindiBhojpuriMarathiTamilTeluguKannadaMalayalamGujaratiBengaliOdia+ regional dialect subgroups

Coverage
12+ Indian languages · 40+ dialect subgroups · Age, gender, region balanced per specification
Annotation type
Phoneme alignment · Code-switch boundary markers · Dialect attribution · Intent labeling · Acoustic SNR score per file
Ambiguity policy
Disagreements preserved and flagged — not averaged. Annotator dialect metadata attached to each divergence.
Domain batches
Healthcare symptom vocabulary · Agricultural terminology · Legal and financial services

TextGrid (Praat)JSON with speaker metadataWhisper / Canary fine-tune ready
HealthcareQ3 2026

India Clinical & Dermatology — Preview

For medical AI teams whose models underperform on Indian clinical presentations, skin tone diversity, and regional diagnostic language.


What's Coming

Dermatological conditions across Fitzpatrick IV–VI skin tonesClinical NLP for Indian diagnostic language patternsRadiology report annotation with Indian disease prevalence priors

Every dataset in this catalog was built the same way.

01 — Collection

Task-engineered briefs. Not open uploads.

Every contributor receives a specific scenario specification before collecting a single clip. On-device QA pre-checks before anything enters the pipeline. Contributor cohort active across 14 Indian states.

02 — Annotation

Fleiss κ on every batch. Below threshold: re-review.

Multi-pass: model-assisted pre-annotation, human correction, independent QA audit. Fleiss kappa IAA scoring per delivery. A batch below threshold goes to expert review — not to delivery. Disagreements are flagged and shipped with the data.

03 — Delivery

Dataset card with every order. Not a folder of files.

Every delivery includes: annotation files, HuggingFace-standard dataset card, IAA report per scenario class, disagreement flag index, version changelog, and provenance chain. PII redacted at the annotation layer.

Don't see the edge case your model is failing on?

Birha scopes, task-engineers, and delivers bespoke datasets to your exact specification. Scenario brief, geographic targeting, annotation schema, quality guarantee. Milestone-based delivery.

If the failure mode exists in the real world, we can build a dataset for it.

Timeline8 weeks from scoping to delivery
Minimum scopeA clearly defined scenario class or failure mode
Quality guaranteeSame Fleiss κ IAA protocol as pre-built datasets

Your model has a data problem we can name precisely.

Tell us the vertical, the deployment environment, and the failure mode you're seeing. We'll tell you in 48 hours whether we have a dataset that addresses it — or can build one.

Enterprise inbound reviewed within 48 hours.