Egocentric (headband)
Live in sampleStable head-mounted POV — Galaxy S24 / iPhone 16 Pro Max, 30fps
Controlled environment capture
World Archive runs consent-first egocentric capture at managed partner sites across India — textile floors, packaging lines, workshops, and service bays — with annotations, manual QA, and buyer-ready delivery built in.
9
live task verticals in sample pack
218
verb–noun action segments
8+
annotation layers per clip
100%
clips CI QA pass
What we capture & deliver
Headband egocentric video, IMU, optional wrist and exocentric cameras — plus structured labels from action segments through object boxes. Status reflects our live 9-clip sample pack.
Stable head-mounted POV — Galaxy S24 / iPhone 16 Pro Max, 30fps
Motion traces synchronized to video timestamps
Close-range tool contact and fine-grain assembly
Fixed overhead / side scene context
Per-frame 2D landmarks, L/R, confidence scores
Segment-aware grounding; persistent object IDs
Granular verb–noun phases (~8–15s median) with task notes
Derived contact samples from bbox overlap
Sensor depth where hardware supports; monocular marked
Extended capture rigs on pilot programs
See the data
Plain video, hand skeleton overlays, and object bounding boxes — with the matching annotation JSON for the same moment in time.
Interactive data explorer
Industrial Sewing
factory · Samsung Galaxy S24 · GGN_20260618_S02
{
"start_sec": 11,
"end_sec": 17,
"action": "align",
"object": "cloth_strip",
"task": "sew_garment",
"notes": "Pick new cloth strip; align under needle bar"
}Annotation depth
218 action segments across 9 clips — each with action, object, task, and operator notes. Median segment ~12s; many phases in the 5–15s range suited to imitation-learning episodes. Every clip ships with 21-joint hand keypoints, object boxes, hand boxes, and contact samples.
218 verb–noun manipulation phases across 9 clips. Each segment carries action, object, task, and operator notes — aligned to imitation-learning episode granularity.
{ "action": "align", "object": "cloth_strip", "start_sec": 11.0, "end_sec": 17.0 }Frame-level 2D landmarks for left and right hands. Roll-up stats per clip: hands-detected %, two-hands %.
{ "frame_idx": 360, "hands": [{ "label": "right", "score": 0.94, "landmarks_21": "..." }] }2D boxes with persistent track IDs. Segment-aware GPU grounding for tools and manipulated objects. Separate hand_boxes.jsonl per clip.
{ "class": "garment", "track_id": 2, "bbox": [0.22, 0.41, 0.58, 0.72], "confidence": 0.87 }Per-clip JSON: device, mount, session, consent, SHA256, manipulator stats, CI QA flags. Machine-readable qa_report.json for all 9 clips.
{ "qa_flags": { "continuous_integration_qa_pass": true, "schema_validated": true } }Manual QA
Live environments
factory
Cloth alignment, machine stitching, trim
24 segments · 54.4% hands visible
factory
Bimanual packing, insert, seal cycles
29 segments · 70.2% hands visible
roadside shop
Hammer, nail, strap manipulation on chair frame
25 segments · 68.6% hands visible
factory
Heat-seal shuttle tubes, batch rope
24 segments · 32.4% hands visible
factory
Iron, fold, pack garments
18 segments · 36.5% hands visible
restaurant
Ladle, season, stir in vessel
22 segments · 26.3% hands visible
car showroom
Microfiber wipe, applicator pad
20 segments · 72.5% hands visible
repair shop
Mix, filter, spray-gun application
28 segments · 55.5% hands visible
roadside repair
Sander on car body panel
24 segments · 29.5% hands visible
Pipeline
01
Partner-site sessions with headband ego rig, IMU, optional wrist/exo cameras.
02
Commercial AI-training consent collected before any buyer delivery.
03
Face/PII review; audio stripped from deliverable MP4s.
04
Action segments, 21-pt hands, object boxes, contact, captions.
05
Human verification of segments, labels, and overlay quality.
06
MP4 + JSONL + schema docs via S3 or buyer-defined format.
Delivery formats
| Format | Contents |
|---|---|
| MP4 | Raw ego video, overlay previews, box previews |
| JSONL | action_segments, hand_keypoints, object_boxes, hand_boxes, contact |
| JSON | Per-clip metadata, session, QA report, summaries |
| CSV | Frame ↔ timestamp mapping per clip |
| Schema docs | annotation_schema.md, object_taxonomy.md, action_taxonomy.md |
FAQ
No. All sample clips are real human demonstrations from named partner environments in India. No synthetic motion or generated humans.
Verb–noun action segments (median ~12s), 21-joint hand keypoints per frame, object boxes with track IDs, hand–object contact samples, per-clip metadata and QA reports.
We run managed partner sites with trained operators, locked capture kits, aligned worker incentives, and short structured sessions — not ad-hoc roadside gig workers.
Yes. The full 9-clip sample pack (19.4 GB) is on public S3 with schema docs, overlays, and QA reports. Use the Data Explorer on this site for a quick preview.
Yes. Pilot and enterprise programs can align to buyer-defined action labels, object taxonomies, HDF5 export, and custom camera setups.
9 clips · 19.4 GB · full schema docs on public S3. Or book a call to scope a pilot collection.