Controlled environment capture

Factory-floor data robots can learn from.

World Archive runs consent-first egocentric capture at managed partner sites across India — textile floors, packaging lines, workshops, and service bays — with annotations, manual QA, and buyer-ready delivery built in.

9

live task verticals in sample pack

218

verb–noun action segments

8+

annotation layers per clip

100%

clips CI QA pass

What we capture & deliver

Modalities and annotation layers your robotics team can inspect.

Headband egocentric video, IMU, optional wrist and exocentric cameras — plus structured labels from action segments through object boxes. Status reflects our live 9-clip sample pack.

Egocentric (headband)

Live in sample

Stable head-mounted POV — Galaxy S24 / iPhone 16 Pro Max, 30fps

IMU (wrist + head)

Live in sample

Motion traces synchronized to video timestamps

Wrist camera

Available

Close-range tool contact and fine-grain assembly

Exocentric

Available

Fixed overhead / side scene context

21-joint hand keypoints

Live in sample

Per-frame 2D landmarks, L/R, confidence scores

Object boxes + track IDs

Live in sample

Segment-aware grounding; persistent object IDs

Action segments

Live in sample

Granular verb–noun phases (~8–15s median) with task notes

Hand–object contact

Live in sample

Derived contact samples from bbox overlap

Depth

On request

Sensor depth where hardware supports; monocular marked

3D hand / camera pose

On request

Extended capture rigs on pilot programs

See the data

Switch layers. Read the JSON.

Plain video, hand skeleton overlays, and object bounding boxes — with the matching annotation JSON for the same moment in time.

Open full explorer →

Interactive data explorer

Industrial Sewing

factory · Samsung Galaxy S24 · GGN_20260618_S02

Full sample on S3
Plain videot=12s · 6s loop
{
  "start_sec": 11,
  "end_sec": 17,
  "action": "align",
  "object": "cloth_strip",
  "task": "sew_garment",
  "notes": "Pick new cloth strip; align under needle bar"
}

Annotation depth

Granular verb–noun segments, not just raw video.

218 action segments across 9 clips — each with action, object, task, and operator notes. Median segment ~12s; many phases in the 5–15s range suited to imitation-learning episodes. Every clip ships with 21-joint hand keypoints, object boxes, hand boxes, and contact samples.

Action segments

218 verb–noun manipulation phases across 9 clips. Each segment carries action, object, task, and operator notes — aligned to imitation-learning episode granularity.

{ "action": "align", "object": "cloth_strip", "start_sec": 11.0, "end_sec": 17.0 }

21-point hand keypoints

Frame-level 2D landmarks for left and right hands. Roll-up stats per clip: hands-detected %, two-hands %.

{ "frame_idx": 360, "hands": [{ "label": "right", "score": 0.94, "landmarks_21": "..." }] }

Object bounding boxes

2D boxes with persistent track IDs. Segment-aware GPU grounding for tools and manipulated objects. Separate hand_boxes.jsonl per clip.

{ "class": "garment", "track_id": 2, "bbox": [0.22, 0.41, 0.58, 0.72], "confidence": 0.87 }

Metadata + QA

Per-clip JSON: device, mount, session, consent, SHA256, manipulator stats, CI QA flags. Machine-readable qa_report.json for all 9 clips.

{ "qa_flags": { "continuous_integration_qa_pass": true, "schema_validated": true } }

Manual QA

Automated first pass. Human verification before delivery.

  • Automated structural checks — resolution, duration alignment, schema validation
  • Hand visibility and occlusion logging per clip
  • Human review of action segments and object label quality
  • Validation gate before promote to buyer-facing annotations/
  • PII / face privacy review; audio stripped by default
  • Delivery manifest with SHA256 integrity hashes

Live environments

Nine real workplaces. One inspectable sample pack.

factory

Industrial Sewing

Cloth alignment, machine stitching, trim

24 segments · 54.4% hands visible

factory

Shuttle Tube Packaging

Bimanual packing, insert, seal cycles

29 segments · 70.2% hands visible

roadside shop

Cane Weaving

Hammer, nail, strap manipulation on chair frame

25 segments · 68.6% hands visible

factory

Heat Gun & Batching

Heat-seal shuttle tubes, batch rope

24 segments · 32.4% hands visible

factory

Garment Ironing

Iron, fold, pack garments

18 segments · 36.5% hands visible

restaurant

Commercial Catering

Ladle, season, stir in vessel

22 segments · 26.3% hands visible

car showroom

Car Detailing

Microfiber wipe, applicator pad

20 segments · 72.5% hands visible

repair shop

Primer & Painting

Mix, filter, spray-gun application

28 segments · 55.5% hands visible

roadside repair

Denting & Filing

Sander on car body panel

24 segments · 29.5% hands visible

Pipeline

Capture to delivery.

01

Capture

Partner-site sessions with headband ego rig, IMU, optional wrist/exo cameras.

02

Consent

Commercial AI-training consent collected before any buyer delivery.

03

Anonymize

Face/PII review; audio stripped from deliverable MP4s.

04

Annotate

Action segments, 21-pt hands, object boxes, contact, captions.

05

Manual QA

Human verification of segments, labels, and overlay quality.

06

Deliver

MP4 + JSONL + schema docs via S3 or buyer-defined format.

Delivery formats

FormatContents
MP4Raw ego video, overlay previews, box previews
JSONLaction_segments, hand_keypoints, object_boxes, hand_boxes, contact
JSONPer-clip metadata, session, QA report, summaries
CSVFrame ↔ timestamp mapping per clip
Schema docsannotation_schema.md, object_taxonomy.md, action_taxonomy.md

FAQ

Common questions.

Is this synthetic or staged data?

No. All sample clips are real human demonstrations from named partner environments in India. No synthetic motion or generated humans.

What annotation granularity do you provide?

Verb–noun action segments (median ~12s), 21-joint hand keypoints per frame, object boxes with track IDs, hand–object contact samples, per-clip metadata and QA reports.

How is factory capture different from gig collection?

We run managed partner sites with trained operators, locked capture kits, aligned worker incentives, and short structured sessions — not ad-hoc roadside gig workers.

Can we inspect before buying?

Yes. The full 9-clip sample pack (19.4 GB) is on public S3 with schema docs, overlays, and QA reports. Use the Data Explorer on this site for a quick preview.

Do you support custom schemas?

Yes. Pilot and enterprise programs can align to buyer-defined action labels, object taxonomies, HDF5 export, and custom camera setups.

Ready to inspect?

9 clips · 19.4 GB · full schema docs on public S3. Or book a call to scope a pilot collection.