Fast Stereo Visual Odometry

kitty-vo-v2.py is an optimized stereo visual odometry pipeline for KITTI-style sequences.

Features

Compared to the earlier version, this implementation improves speed by using:

FAST + Lucas–Kanade optical flow instead of ORB + brute-force matching
Keyframe-based depth refresh instead of recomputing stereo depth every frame
Optional StereoBM matcher for faster disparity computation
Forward-backward optical flow consistency for better outlier rejection
Reuse of the camera matrix and stereo matcher
Optional minimap overlay rendered directly onto the debug video

Usage

python kitty-vo-v2.py [sequence] [--max-frames N] [--depth-interval K] [--stereo sgbm|bm] [--no-debug]

Example:

python kitty-vo-v2.py frames --depth-interval 5 --stereo bm

Command-Line Arguments

Argument	Description
`sequence`	Path to the sequence folder. Default: `frames`
`--max-frames`	Maximum number of frames to process
`--depth-interval`	Recompute stereo depth every `N` frames
`--stereo`	Stereo matcher: `sgbm` or `bm`
`--no-debug`	Disable debug visualization

Expected Folder Structure

frames/
├── image_0/
│   ├── 000000.png
│   ├── 000001.png
│   └── ...
├── image_1/
│   ├── 000000.png
│   ├── 000001.png
│   └── ...
├── calib.txt
└── poses.txt   # optional ground truth

Pipeline Overview

For each frame, the algorithm performs the following steps:

Track previously detected image points using Lucas–Kanade optical flow
Reject unstable tracks using a forward-backward consistency check
Back-project valid previous-frame pixels into 3D using the stored depth map
Estimate relative camera motion with solvePnPRansac
Accumulate the estimated pose into a global trajectory
Refresh stereo depth only every few frames, or when tracking quality drops
Update the minimap and optional debug visualization

`Minimap`

The Minimap class renders a top-down trajectory overlay directly onto the video.

It displays:

estimated trajectory
current position
optional ground-truth trajectory

Coordinate Mapping

The minimap uses the same convention as the trajectory plot:

pose[:, 0] → horizontal map axis
pose[:, 2] → vertical map axis

Since screen coordinates increase downward, the Z axis is flipped for display.

Main Methods

`update(pose_xyz)`

Adds the current pose to the minimap trajectory and periodically recomputes the map scale.

`draw(frame)`

Draws the minimap overlay onto a BGR frame.

Configuration Constants

Constant	Meaning
`MAX_FEATURES`	Target number of tracked points
`MIN_FEATURES`	Minimum number of tracked points before refresh
`FAST_THRESHOLD`	FAST detector threshold
`MAX_DEPTH_M`	Maximum accepted depth in meters
`RANSAC_REPROJ_ERR`	Reprojection error threshold for PnP RANSAC
`RANSAC_ITERS`	Maximum RANSAC iterations
`MIN_INLIERS`	Minimum number of inliers required for valid motion

LK Tracking Parameters

LK_PARAMS = dict(
    winSize=(21, 21),
    maxLevel=3,
    criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 0.01),
)

Calibration Loading

Calibration is read from calib.txt using:

def load_calib(calib_path: str):
    ...

The function loads:

P0: left camera projection matrix
P1: right camera projection matrix

These are used to extract:

fx = P0[0, 0]
fy = P0[1, 1]
cx = P0[0, 2]
cy = P0[1, 2]
baseline = abs(P1[0, 3] / P1[0, 0])

The baseline formula comes from:

P1 = [fx 0 cx -fx*b; ...]

Stereo Depth Estimation

Stereo depth is computed from disparity.

`make_stereo(mode)`

Creates the stereo matcher:

bm: faster, less accurate
sgbm: slower, more accurate

StereoBM

cv2.StereoBM_create(numDisparities=128, blockSize=15)

StereoSGBM

cv2.StereoSGBM_create(
    minDisparity=0,
    numDisparities=128,
    blockSize=7,
    P1=8  * 3 * 7**2,
    P2=32 * 3 * 7**2,
    disp12MaxDiff=1,
    uniquenessRatio=10,
    speckleWindowSize=100,
    speckleRange=32,
)

`compute_depth(stereo, left, right, fx, baseline)`

Computes the depth map from a stereo pair.

Formula

depth = (fx * baseline) / disparity

Behavior

disparity values <= 1.0 are treated as invalid
invalid depth values are stored as NaN

Feature Detection

`detect_fast(gray, n=MAX_FEATURES, mask=None)`

Detects image corners using the FAST detector.

Behavior

detects candidate keypoints
sorts them by response
keeps only the strongest n points
returns them in shape (N, 1, 2) for OpenCV optical flow

Feature Refresh

`refresh_features(gray, existing_pts, target=MAX_FEATURES)`

Tops up the current set of tracked points.

How it works

computes how many new points are needed
masks out existing feature locations
detects additional FAST corners in uncovered regions
merges new and surviving points

Optical Flow Tracking

`lk_track(prev_gray, curr_gray, prev_pts)`

Tracks image points from the previous frame to the current frame using pyramidal Lucas–Kanade optical flow.

Forward-Backward Consistency Check

After forward tracking, the points are tracked backward:

previous → current
current → previous

The round-trip error is computed as:

fb_err = np.linalg.norm((prev_pts - back_pts).reshape(-1, 2), axis=1)

Only points satisfying all of the following are kept:

forward tracking succeeded
backward tracking succeeded
forward-backward error is less than 1.0 pixel

Return Values

curr_pts: tracked points in the current image
valid: boolean mask of reliable tracks

3D Reconstruction

`pts_to_3d(pts_2d, depth_map, fx, fy, cx, cy)`

Back-projects 2D image points into 3D coordinates using the depth map.

Formula

x = (u - cx) * z / fx
y = (v - cy) * z / fy
z = depth

Validation

A point is accepted only if:

it lies inside the image bounds
the depth value is finite
the depth is positive
the depth is below MAX_DEPTH_M

Return Values

pts3d: reconstructed 3D points
valid: boolean mask of points with valid depth

`FastStereoVO`

This class implements the main stereo visual odometry pipeline.

Constructor

vo = FastStereoVO(seq_path, stereo_mode="sgbm", depth_interval=5)

Parameters

Parameter	Description
`seq_path`	Path to the KITTI-style sequence folder
`stereo_mode`	Stereo matcher: `sgbm` or `bm`
`depth_interval`	Number of frames between stereo depth recomputations

Initialization

During initialization, the class:

loads stereo calibration
extracts intrinsic parameters and baseline
builds the camera matrix K
creates the stereo matcher
loads sorted image paths from image_0/ and image_1/

Camera Matrix

self.K = np.array([
    [self.fx,       0, self.cx],
    [      0, self.fy, self.cy],
    [      0,       0,       1],
], np.float64)

Internal Helpers

`_load(i)`

Loads grayscale stereo images for frame i.

`_depth(left, right)`

Computes the depth map for a stereo pair using the configured stereo matcher.

Main Processing Loop

`run(max_frames=None, show_debug=True, gt_path=None)`

Processes the image sequence and returns the estimated trajectory.

Parameters

Parameter	Description
`max_frames`	Maximum number of frames to process
`show_debug`	Whether to display the tracking window
`gt_path`	Optional path to ground-truth poses for the minimap

Initialization

At the beginning of the run:

pose is initialized to identity
trajectory starts at the origin
optional ground truth is loaded
the minimap is initialized
frame 0 is loaded
an initial depth map is computed
initial FAST features are detected

Per-Frame Steps

1. Load current stereo pair

The next left and right grayscale images are read.

2. Track points

Tracked features are propagated from the previous frame using Lucas–Kanade optical flow.

3. Build 3D–2D correspondences

previous-frame tracked points are back-projected to 3D using prev_depth
corresponding current-frame tracked points provide the 2D measurements

This produces:

obj_pts: 3D points
img_pts: 2D image points

4. Estimate relative pose

If at least MIN_INLIERS correspondences exist, motion is estimated with:

cv2.solvePnPRansac(
    obj_pts, img_pts, self.K, None,
    iterationsCount=RANSAC_ITERS,
    reprojectionError=RANSAC_REPROJ_ERR,
    confidence=0.999,
    flags=cv2.SOLVEPNP_AP3P,
)

5. Update global pose

If PnP succeeds and enough inliers are found:

the Rodrigues vector is converted into a rotation matrix
a relative transform is built
the global pose is updated using the inverse relative motion

pose = pose @ np.linalg.inv(T)

The current translation is appended to the trajectory.

6. Refresh depth if needed

Depth is recomputed if either:

the age of the current depth map reaches depth_interval
the number of tracked points falls below MIN_FEATURES

If depth is refreshed, surviving features are topped up with new FAST corners.

7. Update debug display

If debug mode is enabled:

current tracked points are drawn
the minimap is composited onto the frame
the frame is shown in a window

Press Esc to stop early.

Timing Output

At the end of processing, the method prints average timings per frame for:

LK tracking
solvePnP
stereo depth
total processing time excluding image loading

Example:

── Timing over 499 frames ──────────────────
  LK tracking : 2.8 ms/frame
  solvePnP    : 0.7 ms/frame
  Stereo depth: 4.5 ms/frame  (every 5 frames)
  Total (excl. imread): 8.0 ms/frame  → ~125 fps potential

Return Value

np.array(trajectory)

This is an N x 3 array of estimated camera positions.

Trajectory Plotting

`plot_trajectory(traj, gt_path=None)`

Plots the estimated trajectory in the X-Z plane.

Behavior

plots estimated trajectory
optionally loads and plots ground truth from poses.txt
uses equal axis scaling for proper shape comparison

Ground-truth poses are expected in KITTI format:

r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz

The translation column is extracted from each 3 x 4 pose matrix.

Example Console Output

Successful tracking output:

Frame 0001 | tracked= 842 | 3D-2D= 201 | inliers= 155
Frame 0002 | tracked= 801 | 3D-2D= 189 | inliers= 147
Frame 0003 | tracked= 790 | 3D-2D= 176 | inliers= 139

Failure example:

Frame 0012 | PnP failed (tracked=54)

Notes

The script assumes rectified stereo image pairs
Optical flow is faster than descriptor matching, but can be more sensitive to large appearance changes
Carrying depth forward improves speed, but large frame-to-frame motion can reduce accuracy
StereoBM is faster, while StereoSGBM is generally more accurate
This is pure visual odometry, so drift accumulates over time
Ground truth is optional and is used only for plotting and minimap scaling

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search