Fast Stereo Visual Odometry

kitty-vo-v2.py is an optimized stereo visual odometry pipeline for KITTI-style sequences.

Features

Compared to the earlier version, this implementation improves speed by using:

  • FAST + Lucas–Kanade optical flow instead of ORB + brute-force matching
  • Keyframe-based depth refresh instead of recomputing stereo depth every frame
  • Optional StereoBM matcher for faster disparity computation
  • Forward-backward optical flow consistency for better outlier rejection
  • Reuse of the camera matrix and stereo matcher
  • Optional minimap overlay rendered directly onto the debug video

Usage

python kitty-vo-v2.py [sequence] [--max-frames N] [--depth-interval K] [--stereo sgbm|bm] [--no-debug]

Example:

python kitty-vo-v2.py frames --depth-interval 5 --stereo bm

Command-Line Arguments

Argument Description
sequence Path to the sequence folder. Default: frames
--max-frames Maximum number of frames to process
--depth-interval Recompute stereo depth every N frames
--stereo Stereo matcher: sgbm or bm
--no-debug Disable debug visualization

Expected Folder Structure

frames/
├── image_0/
│   ├── 000000.png
│   ├── 000001.png
│   └── ...
├── image_1/
│   ├── 000000.png
│   ├── 000001.png
│   └── ...
├── calib.txt
└── poses.txt   # optional ground truth

Pipeline Overview

For each frame, the algorithm performs the following steps:

  1. Track previously detected image points using Lucas–Kanade optical flow
  2. Reject unstable tracks using a forward-backward consistency check
  3. Back-project valid previous-frame pixels into 3D using the stored depth map
  4. Estimate relative camera motion with solvePnPRansac
  5. Accumulate the estimated pose into a global trajectory
  6. Refresh stereo depth only every few frames, or when tracking quality drops
  7. Update the minimap and optional debug visualization

Minimap

The Minimap class renders a top-down trajectory overlay directly onto the video.

It displays:

  • estimated trajectory
  • current position
  • optional ground-truth trajectory

Coordinate Mapping

The minimap uses the same convention as the trajectory plot:

  • pose[:, 0] → horizontal map axis
  • pose[:, 2] → vertical map axis

Since screen coordinates increase downward, the Z axis is flipped for display.

Main Methods

update(pose_xyz)

Adds the current pose to the minimap trajectory and periodically recomputes the map scale.

draw(frame)

Draws the minimap overlay onto a BGR frame.


Configuration Constants

Constant Meaning
MAX_FEATURES Target number of tracked points
MIN_FEATURES Minimum number of tracked points before refresh
FAST_THRESHOLD FAST detector threshold
MAX_DEPTH_M Maximum accepted depth in meters
RANSAC_REPROJ_ERR Reprojection error threshold for PnP RANSAC
RANSAC_ITERS Maximum RANSAC iterations
MIN_INLIERS Minimum number of inliers required for valid motion

LK Tracking Parameters

LK_PARAMS = dict(
    winSize=(21, 21),
    maxLevel=3,
    criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 0.01),
)

Calibration Loading

Calibration is read from calib.txt using:

def load_calib(calib_path: str):
    ...

The function loads:

  • P0: left camera projection matrix
  • P1: right camera projection matrix

These are used to extract:

fx = P0[0, 0]
fy = P0[1, 1]
cx = P0[0, 2]
cy = P0[1, 2]
baseline = abs(P1[0, 3] / P1[0, 0])

The baseline formula comes from:

P1 = [fx 0 cx -fx*b; ...]

Stereo Depth Estimation

Stereo depth is computed from disparity.

make_stereo(mode)

Creates the stereo matcher:

  • bm: faster, less accurate
  • sgbm: slower, more accurate

StereoBM

cv2.StereoBM_create(numDisparities=128, blockSize=15)

StereoSGBM

cv2.StereoSGBM_create(
    minDisparity=0,
    numDisparities=128,
    blockSize=7,
    P1=8  * 3 * 7**2,
    P2=32 * 3 * 7**2,
    disp12MaxDiff=1,
    uniquenessRatio=10,
    speckleWindowSize=100,
    speckleRange=32,
)

compute_depth(stereo, left, right, fx, baseline)

Computes the depth map from a stereo pair.

Formula

depth = (fx * baseline) / disparity

Behavior

  • disparity values <= 1.0 are treated as invalid
  • invalid depth values are stored as NaN

Feature Detection

detect_fast(gray, n=MAX_FEATURES, mask=None)

Detects image corners using the FAST detector.

Behavior

  • detects candidate keypoints
  • sorts them by response
  • keeps only the strongest n points
  • returns them in shape (N, 1, 2) for OpenCV optical flow

Feature Refresh

refresh_features(gray, existing_pts, target=MAX_FEATURES)

Tops up the current set of tracked points.

How it works

  • computes how many new points are needed
  • masks out existing feature locations
  • detects additional FAST corners in uncovered regions
  • merges new and surviving points

Optical Flow Tracking

lk_track(prev_gray, curr_gray, prev_pts)

Tracks image points from the previous frame to the current frame using pyramidal Lucas–Kanade optical flow.

Forward-Backward Consistency Check

After forward tracking, the points are tracked backward:

  • previous → current
  • current → previous

The round-trip error is computed as:

fb_err = np.linalg.norm((prev_pts - back_pts).reshape(-1, 2), axis=1)

Only points satisfying all of the following are kept:

  • forward tracking succeeded
  • backward tracking succeeded
  • forward-backward error is less than 1.0 pixel

Return Values

  • curr_pts: tracked points in the current image
  • valid: boolean mask of reliable tracks

3D Reconstruction

pts_to_3d(pts_2d, depth_map, fx, fy, cx, cy)

Back-projects 2D image points into 3D coordinates using the depth map.

Formula

x = (u - cx) * z / fx
y = (v - cy) * z / fy
z = depth

Validation

A point is accepted only if:

  • it lies inside the image bounds
  • the depth value is finite
  • the depth is positive
  • the depth is below MAX_DEPTH_M

Return Values

  • pts3d: reconstructed 3D points
  • valid: boolean mask of points with valid depth

FastStereoVO

This class implements the main stereo visual odometry pipeline.

Constructor

vo = FastStereoVO(seq_path, stereo_mode="sgbm", depth_interval=5)

Parameters

Parameter Description
seq_path Path to the KITTI-style sequence folder
stereo_mode Stereo matcher: sgbm or bm
depth_interval Number of frames between stereo depth recomputations

Initialization

During initialization, the class:

  • loads stereo calibration
  • extracts intrinsic parameters and baseline
  • builds the camera matrix K
  • creates the stereo matcher
  • loads sorted image paths from image_0/ and image_1/

Camera Matrix

self.K = np.array([
    [self.fx,       0, self.cx],
    [      0, self.fy, self.cy],
    [      0,       0,       1],
], np.float64)

Internal Helpers

_load(i)

Loads grayscale stereo images for frame i.

_depth(left, right)

Computes the depth map for a stereo pair using the configured stereo matcher.


Main Processing Loop

run(max_frames=None, show_debug=True, gt_path=None)

Processes the image sequence and returns the estimated trajectory.

Parameters

Parameter Description
max_frames Maximum number of frames to process
show_debug Whether to display the tracking window
gt_path Optional path to ground-truth poses for the minimap

Initialization

At the beginning of the run:

  • pose is initialized to identity
  • trajectory starts at the origin
  • optional ground truth is loaded
  • the minimap is initialized
  • frame 0 is loaded
  • an initial depth map is computed
  • initial FAST features are detected

Per-Frame Steps

1. Load current stereo pair

The next left and right grayscale images are read.

2. Track points

Tracked features are propagated from the previous frame using Lucas–Kanade optical flow.

3. Build 3D–2D correspondences

  • previous-frame tracked points are back-projected to 3D using prev_depth
  • corresponding current-frame tracked points provide the 2D measurements

This produces:

  • obj_pts: 3D points
  • img_pts: 2D image points

4. Estimate relative pose

If at least MIN_INLIERS correspondences exist, motion is estimated with:

cv2.solvePnPRansac(
    obj_pts, img_pts, self.K, None,
    iterationsCount=RANSAC_ITERS,
    reprojectionError=RANSAC_REPROJ_ERR,
    confidence=0.999,
    flags=cv2.SOLVEPNP_AP3P,
)

5. Update global pose

If PnP succeeds and enough inliers are found:

  • the Rodrigues vector is converted into a rotation matrix
  • a relative transform is built
  • the global pose is updated using the inverse relative motion
pose = pose @ np.linalg.inv(T)

The current translation is appended to the trajectory.

6. Refresh depth if needed

Depth is recomputed if either:

  • the age of the current depth map reaches depth_interval
  • the number of tracked points falls below MIN_FEATURES

If depth is refreshed, surviving features are topped up with new FAST corners.

7. Update debug display

If debug mode is enabled:

  • current tracked points are drawn
  • the minimap is composited onto the frame
  • the frame is shown in a window

Press Esc to stop early.

Timing Output

At the end of processing, the method prints average timings per frame for:

  • LK tracking
  • solvePnP
  • stereo depth
  • total processing time excluding image loading

Example:

── Timing over 499 frames ──────────────────
  LK tracking : 2.8 ms/frame
  solvePnP    : 0.7 ms/frame
  Stereo depth: 4.5 ms/frame  (every 5 frames)
  Total (excl. imread): 8.0 ms/frame  → ~125 fps potential

Return Value

np.array(trajectory)

This is an N x 3 array of estimated camera positions.


Trajectory Plotting

plot_trajectory(traj, gt_path=None)

Plots the estimated trajectory in the X-Z plane.

Behavior

  • plots estimated trajectory
  • optionally loads and plots ground truth from poses.txt
  • uses equal axis scaling for proper shape comparison

Ground-truth poses are expected in KITTI format:

r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz

The translation column is extracted from each 3 x 4 pose matrix.


Example Console Output

Successful tracking output:

Frame 0001 | tracked= 842 | 3D-2D= 201 | inliers= 155
Frame 0002 | tracked= 801 | 3D-2D= 189 | inliers= 147
Frame 0003 | tracked= 790 | 3D-2D= 176 | inliers= 139

Failure example:

Frame 0012 | PnP failed (tracked=54)

Notes

  • The script assumes rectified stereo image pairs
  • Optical flow is faster than descriptor matching, but can be more sensitive to large appearance changes
  • Carrying depth forward improves speed, but large frame-to-frame motion can reduce accuracy
  • StereoBM is faster, while StereoSGBM is generally more accurate
  • This is pure visual odometry, so drift accumulates over time
  • Ground truth is optional and is used only for plotting and minimap scaling