Franka Vision-Guided Fine Manipulation
Unconstrained millimeter Bogie Alignment
Andnet DeBoer, Derek Dietz, Theo Coulson
Northwestern University
Overview
This project demonstrates precise fine manipulation using a Franka Emika robot arm to manipulate HO-scale model train cars with ±1mm accuracy. The system integrates a robust computer vision pipeline with MoveIt2 motion planning to solve a challenging alignment task: positioning free-spinning train bogies onto model railroad tracks.
The project also establishes a zero-shot data distillation pipeline for training custom object detection models, using the robot itself to autonomously collect and generate training data.
Problem Statement
Aligning model train cars onto tracks requires sub-millimeter precision due to the unconstrained rotation of the bogies—the wheel assemblies can spin freely in any direction when the train is lifted, similar to caster wheels. Traditional pick-and-place approaches fail because:
Bogie orientation is unknown when the gripper approaches the train
Track orientation varies across the layout and must be detected in real-time
Class similarity from top-down view makes distinguishing trains from tracks challenging for vision systems
Solution
End Effector
Our solution uses a custom end effector to physically constrain the bogie to a known rotation, combined with a robust OpenCV pipeline to detect track orientation. The gripper then aligns the constrained wheel assembly with the detected track angle before placement.
System Architecture
┌─────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ RealSense │────▶│ Vision System │────▶│ Conductor Node │
│ Camera │ │ (Track + Car) │ │ │
└─────────────┘ └─────────────────┘ └────────┬─────────┘
│
Target Poses + Gripper States
▼
┌─────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Franka Arm │◀────│ MoveIt2 API │◀────│ Railer │
│ │ │ │ │ │
└─────────────┘ └─────────────────┘ └──────────────────┘
Computer Vision Pipeline
Track Detection
A multi-stage OpenCV pipeline processes RGB images from the RealSense camera to detect track orientation:
- Preprocessing: Brightness, contrast, and white balance adjustment
- Edge Detection: Canny edge detection on enhanced images
- Morphological Operations: Dilation and skeletonization to extract rail centerlines
- Line Detection: Hough transform to identify track segments
- Pose Estimation: Convert 2D track orientation to 3D transforms using depth data
Train Detection & Classification
Zero-Shot Data Distillation Pipeline
| Stage | Method | Output |
|---|---|---|
| Data Collection | Franka conical scans of each train car → ROS bags | 30,000 RGB-D sequences |
| Frame Extraction | Every 10th frame sampled | ~3,000 images |
| Auto-Labeling | Grounding DINO + SAM2 | Bounding boxes (~70% accurate) |
| Manual Refinement | Human correction | Clean training labels |
| Model Training | YOLOv8-OBB | Oriented bounding box detection |
Training Challenges
The vision system required adversarial training to handle edge cases:
- Tracks misclassified as trains (similar dark, elongated shapes)
- Trains misclassified as tracks (especially from top-down view)
- Significant visual similarity between classes when viewed from above
Model Architecture
model = YOLO('yolov8n-obb.pt')
results = model.train(
data='dataset.yaml',
epochs=60,
imgsz=640,
batch=16,
device=0,
name='augmented_model',
mosaic=1.0,
copy_paste=0.4,
degrees=10,
translate=0.1,
scale=0.5,
shear=2,
)
Results
- mAP50: 0.95+
- mAP50-95: 0.85+
- Precision: 0.92+
- Recall: 0.90+
Train Car Classes
The system is capable of recognizing 12 distinct train car types and 2 switches
Key Features
Oriented Bounding Boxes (OBB)
Standard axis-aligned bounding boxes are insufficient for rotated objects. We use oriented bounding boxes that include rotation angle, enabling:
- More accurate object localization
- Direct extraction of train orientation for gripper alignment
- Better handling of diagonal track sections
# Extract OBB from detection
center, (width, height), angle = cv2.minAreaRect(contour)
Rail Rejection
To prevent false positives where track sections are detected as trains:
- Aspect ratio filtering (trains have characteristic length/width ratios)
- Context-aware rejection (objects on tracks vs. beside tracks)
- Multi-frame temporal consistency
Train Centering
Precise centroid calculation using SAM2 segmentation masks:
- Generate instance segmentation mask
- Calculate mask centroid
- Project to 3D using depth alignment
- Publish as TF transform for motion planning
Hardware
- Robot: Franka Emika Panda 7-DOF arm
- Camera: Intel RealSense D435 (RGB + Depth)
- End Effector: Custom 3D-printed gripper with bogie constraint mechanism
- Trains: HO-scale (1:87) model railroad cars
Software Stack
- ROS 2 Kilted
- MoveIt2 - Motion planning
- OpenCV - Image processing
- Ultralytics YOLOv8 - Object detection
- Grounding DINO - Open-vocabulary detection
- SAM2 - Instance segmentation