Fast Stereo Visual Odometry
kitty-vo-v2.py is an optimized stereo visual odometry pipeline for KITTI-style sequences.
Features
Compared to the earlier version, this implementation improves speed by using:
- FAST + Lucas–Kanade optical flow instead of ORB + brute-force matching
- Keyframe-based depth refresh instead of recomputing stereo depth every frame
- Optional StereoBM matcher for faster disparity computation
- Forward-backward optical flow consistency for better outlier rejection
- Reuse of the camera matrix and stereo matcher
- Optional minimap overlay rendered directly onto the debug video
Usage
python kitty-vo-v2.py [sequence] [--max-frames N] [--depth-interval K] [--stereo sgbm|bm] [--no-debug]
Example:
python kitty-vo-v2.py frames --depth-interval 5 --stereo bm
Command-Line Arguments
| Argument | Description |
|---|---|
sequence |
Path to the sequence folder. Default: frames |
--max-frames |
Maximum number of frames to process |
--depth-interval |
Recompute stereo depth every N frames |
--stereo |
Stereo matcher: sgbm or bm |
--no-debug |
Disable debug visualization |
Expected Folder Structure
frames/
├── image_0/
│ ├── 000000.png
│ ├── 000001.png
│ └── ...
├── image_1/
│ ├── 000000.png
│ ├── 000001.png
│ └── ...
├── calib.txt
└── poses.txt # optional ground truth
Pipeline Overview
For each frame, the algorithm performs the following steps:
- Track previously detected image points using Lucas–Kanade optical flow
- Reject unstable tracks using a forward-backward consistency check
- Back-project valid previous-frame pixels into 3D using the stored depth map
- Estimate relative camera motion with
solvePnPRansac - Accumulate the estimated pose into a global trajectory
- Refresh stereo depth only every few frames, or when tracking quality drops
- Update the minimap and optional debug visualization
Minimap
The Minimap class renders a top-down trajectory overlay directly onto the video.
It displays:
- estimated trajectory
- current position
- optional ground-truth trajectory
Coordinate Mapping
The minimap uses the same convention as the trajectory plot:
pose[:, 0]→ horizontal map axispose[:, 2]→ vertical map axis
Since screen coordinates increase downward, the Z axis is flipped for display.
Main Methods
update(pose_xyz)
Adds the current pose to the minimap trajectory and periodically recomputes the map scale.
draw(frame)
Draws the minimap overlay onto a BGR frame.
Configuration Constants
| Constant | Meaning |
|---|---|
MAX_FEATURES |
Target number of tracked points |
MIN_FEATURES |
Minimum number of tracked points before refresh |
FAST_THRESHOLD |
FAST detector threshold |
MAX_DEPTH_M |
Maximum accepted depth in meters |
RANSAC_REPROJ_ERR |
Reprojection error threshold for PnP RANSAC |
RANSAC_ITERS |
Maximum RANSAC iterations |
MIN_INLIERS |
Minimum number of inliers required for valid motion |
LK Tracking Parameters
LK_PARAMS = dict(
winSize=(21, 21),
maxLevel=3,
criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 0.01),
)
Calibration Loading
Calibration is read from calib.txt using:
def load_calib(calib_path: str):
...
The function loads:
P0: left camera projection matrixP1: right camera projection matrix
These are used to extract:
fx = P0[0, 0]
fy = P0[1, 1]
cx = P0[0, 2]
cy = P0[1, 2]
baseline = abs(P1[0, 3] / P1[0, 0])
The baseline formula comes from:
P1 = [fx 0 cx -fx*b; ...]
Stereo Depth Estimation
Stereo depth is computed from disparity.
make_stereo(mode)
Creates the stereo matcher:
bm: faster, less accuratesgbm: slower, more accurate
StereoBM
cv2.StereoBM_create(numDisparities=128, blockSize=15)
StereoSGBM
cv2.StereoSGBM_create(
minDisparity=0,
numDisparities=128,
blockSize=7,
P1=8 * 3 * 7**2,
P2=32 * 3 * 7**2,
disp12MaxDiff=1,
uniquenessRatio=10,
speckleWindowSize=100,
speckleRange=32,
)
compute_depth(stereo, left, right, fx, baseline)
Computes the depth map from a stereo pair.
Formula
depth = (fx * baseline) / disparity
Behavior
- disparity values
<= 1.0are treated as invalid - invalid depth values are stored as
NaN
Feature Detection
detect_fast(gray, n=MAX_FEATURES, mask=None)
Detects image corners using the FAST detector.
Behavior
- detects candidate keypoints
- sorts them by response
- keeps only the strongest
npoints - returns them in shape
(N, 1, 2)for OpenCV optical flow
Feature Refresh
refresh_features(gray, existing_pts, target=MAX_FEATURES)
Tops up the current set of tracked points.
How it works
- computes how many new points are needed
- masks out existing feature locations
- detects additional FAST corners in uncovered regions
- merges new and surviving points
Optical Flow Tracking
lk_track(prev_gray, curr_gray, prev_pts)
Tracks image points from the previous frame to the current frame using pyramidal Lucas–Kanade optical flow.
Forward-Backward Consistency Check
After forward tracking, the points are tracked backward:
- previous → current
- current → previous
The round-trip error is computed as:
fb_err = np.linalg.norm((prev_pts - back_pts).reshape(-1, 2), axis=1)
Only points satisfying all of the following are kept:
- forward tracking succeeded
- backward tracking succeeded
- forward-backward error is less than
1.0pixel
Return Values
curr_pts: tracked points in the current imagevalid: boolean mask of reliable tracks
3D Reconstruction
pts_to_3d(pts_2d, depth_map, fx, fy, cx, cy)
Back-projects 2D image points into 3D coordinates using the depth map.
Formula
x = (u - cx) * z / fx
y = (v - cy) * z / fy
z = depth
Validation
A point is accepted only if:
- it lies inside the image bounds
- the depth value is finite
- the depth is positive
- the depth is below
MAX_DEPTH_M
Return Values
pts3d: reconstructed 3D pointsvalid: boolean mask of points with valid depth
FastStereoVO
This class implements the main stereo visual odometry pipeline.
Constructor
vo = FastStereoVO(seq_path, stereo_mode="sgbm", depth_interval=5)
Parameters
| Parameter | Description |
|---|---|
seq_path |
Path to the KITTI-style sequence folder |
stereo_mode |
Stereo matcher: sgbm or bm |
depth_interval |
Number of frames between stereo depth recomputations |
Initialization
During initialization, the class:
- loads stereo calibration
- extracts intrinsic parameters and baseline
- builds the camera matrix
K - creates the stereo matcher
- loads sorted image paths from
image_0/andimage_1/
Camera Matrix
self.K = np.array([
[self.fx, 0, self.cx],
[ 0, self.fy, self.cy],
[ 0, 0, 1],
], np.float64)
Internal Helpers
_load(i)
Loads grayscale stereo images for frame i.
_depth(left, right)
Computes the depth map for a stereo pair using the configured stereo matcher.
Main Processing Loop
run(max_frames=None, show_debug=True, gt_path=None)
Processes the image sequence and returns the estimated trajectory.
Parameters
| Parameter | Description |
|---|---|
max_frames |
Maximum number of frames to process |
show_debug |
Whether to display the tracking window |
gt_path |
Optional path to ground-truth poses for the minimap |
Initialization
At the beginning of the run:
- pose is initialized to identity
- trajectory starts at the origin
- optional ground truth is loaded
- the minimap is initialized
- frame 0 is loaded
- an initial depth map is computed
- initial FAST features are detected
Per-Frame Steps
1. Load current stereo pair
The next left and right grayscale images are read.
2. Track points
Tracked features are propagated from the previous frame using Lucas–Kanade optical flow.
3. Build 3D–2D correspondences
- previous-frame tracked points are back-projected to 3D using
prev_depth - corresponding current-frame tracked points provide the 2D measurements
This produces:
obj_pts: 3D pointsimg_pts: 2D image points
4. Estimate relative pose
If at least MIN_INLIERS correspondences exist, motion is estimated with:
cv2.solvePnPRansac(
obj_pts, img_pts, self.K, None,
iterationsCount=RANSAC_ITERS,
reprojectionError=RANSAC_REPROJ_ERR,
confidence=0.999,
flags=cv2.SOLVEPNP_AP3P,
)
5. Update global pose
If PnP succeeds and enough inliers are found:
- the Rodrigues vector is converted into a rotation matrix
- a relative transform is built
- the global pose is updated using the inverse relative motion
pose = pose @ np.linalg.inv(T)
The current translation is appended to the trajectory.
6. Refresh depth if needed
Depth is recomputed if either:
- the age of the current depth map reaches
depth_interval - the number of tracked points falls below
MIN_FEATURES
If depth is refreshed, surviving features are topped up with new FAST corners.
7. Update debug display
If debug mode is enabled:
- current tracked points are drawn
- the minimap is composited onto the frame
- the frame is shown in a window
Press Esc to stop early.
Timing Output
At the end of processing, the method prints average timings per frame for:
- LK tracking
solvePnP- stereo depth
- total processing time excluding image loading
Example:
── Timing over 499 frames ──────────────────
LK tracking : 2.8 ms/frame
solvePnP : 0.7 ms/frame
Stereo depth: 4.5 ms/frame (every 5 frames)
Total (excl. imread): 8.0 ms/frame → ~125 fps potential
Return Value
np.array(trajectory)
This is an N x 3 array of estimated camera positions.
Trajectory Plotting
plot_trajectory(traj, gt_path=None)
Plots the estimated trajectory in the X-Z plane.
Behavior
- plots estimated trajectory
- optionally loads and plots ground truth from
poses.txt - uses equal axis scaling for proper shape comparison
Ground-truth poses are expected in KITTI format:
r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz
The translation column is extracted from each 3 x 4 pose matrix.
Example Console Output
Successful tracking output:
Frame 0001 | tracked= 842 | 3D-2D= 201 | inliers= 155
Frame 0002 | tracked= 801 | 3D-2D= 189 | inliers= 147
Frame 0003 | tracked= 790 | 3D-2D= 176 | inliers= 139
Failure example:
Frame 0012 | PnP failed (tracked=54)
Notes
- The script assumes rectified stereo image pairs
- Optical flow is faster than descriptor matching, but can be more sensitive to large appearance changes
- Carrying depth forward improves speed, but large frame-to-frame motion can reduce accuracy
StereoBMis faster, whileStereoSGBMis generally more accurate- This is pure visual odometry, so drift accumulates over time
- Ground truth is optional and is used only for plotting and minimap scaling