Why is every point offset by a constant amount after I convert WGS84 to UTM?

EPSG:4326 declares its axis order as (latitude, longitude) but drone and GIS files store (longitude, latitude). Without always_xy=True, pyproj swaps your inputs and produces a clean systematic offset, often around 500 m. Rebuild the transformer with always_xy=True to pin the (lon, lat) order.

Why are some transformed coordinates inf or NaN?

Those input points fall outside the valid bounds of the target projection, typically the wrong UTM zone or hemisphere for that longitude. Filter or re-zone the inputs before transforming; a FAIL_NAN status flag isolates them before they corrupt bundle adjustment.

My transform looks correct but coordinates are still off by a few centimetres. Why?

PROJ fell back to a Helmert approximation because the grid-based datum shift file was not found, smearing every point by sub-decimetre amounts. Point PROJ at the correct .gsb/.tif grids and use a compound or 3D target CRS so the vertical geoid shift is resolved too.

Converting WGS84 to Local Grid with Python

You exported your GNSS log in WGS84 (EPSG:4326), reprojected it to a local UTM or State Plane grid, dropped the eastings and northings into your bundle adjustment, and the orthomosaic came back shifted by a stubborn, constant amount — often a few centimetres, sometimes a clean ±500 m, and occasionally the run never starts at all because pyproj raised ProjError: no operation available. The symptom is almost always a systematic offset rather than scattered noise: every point is wrong by the same vector, which is the fingerprint of a coordinate-transformation problem, not a survey-measurement one. This page solves exactly that task — turning a table of WGS84 longitude/latitude/altitude fixes into a datum-correct local grid with a single focused Python routine. It is the hands-on companion to coordinate transformation workflows in PyProj, and it feeds the broader ground control point optimization and coordinate sync workflow that anchors aerial imagery to surveyed ground truth.

Why the offset appears

Three mechanics produce a constant offset when shifting from a global geodetic reference to a project-specific grid, and each one has a deterministic fix.

Axis-order ambiguity flips longitude and latitude. EPSG:4326 declares its axis order as (latitude, longitude), but virtually every drone and GIS file stores coordinates as (longitude, latitude). If you build a Transformer without always_xy=True, pyproj honours the authority order and silently swaps your inputs, which is what produces the classic clean ±500 m (or larger) offset. Forcing always_xy=True pins the order to (lon, lat) in and (easting, northing) out regardless of the authority declaration.

Missing grid shift files force a Helmert fallback. A correct datum transformation between WGS84 and a regional or legacy datum (NAD83, ETRS89, OSGB36) needs a grid-based shift file — an .gsb (NTv2) or .tif grid. When PROJ cannot find that grid it does not fail loudly; it falls back to a coarse Helmert approximation that smears every point by sub-decimetre to decimetre amounts. Point PROJ at the grids with pyproj.datadir.get_data_dir() or the PROJ_DATA environment variable, and prefer a compound or 3D target CRS so vertical (geoid) shifts are resolved too. This is the same CRS contract enforced in managing coordinate reference systems in GDAL.

Unbounded loads exhaust RAM mid-batch. A naive pd.read_csv() of a multi-million-row RTK/PPK log, or in-memory accumulation of transformed rows, can trigger an OOM kill part-way through a run and leave a half-written output. Streaming the table in bounded chunks keeps the working set fixed and the run reproducible from a few hundred GCPs to multi-gigarow logs.

Minimal reproducible solution

The routine below reads a WGS84 coordinate table, builds the Transformer exactly once (it is expensive to construct, cheap to reuse), transforms every row in a single vectorised PROJ call, and tags each output with a PASS/FAIL status before it can reach the solver. It is intentionally tight — under 40 lines — so the accuracy-critical decisions are visible; the parent page wraps the same core in a chunked, CLI-driven pipeline for large inventories.

#!/usr/bin/env python3
"""wgs84_to_local.py — datum-safe WGS84 -> local grid transform for GCP tables."""
import sys

import numpy as np
import pandas as pd
from pyproj import Transformer
from pyproj.exceptions import ProjError


def to_local_grid(df: pd.DataFrame, target_epsg: int, tol_m: float = 0.02) -> pd.DataFrame:
    # Build the transformer once; always_xy forces (lon, lat) in / (E, N) out.
    try:
        tf = Transformer.from_crs(4326, target_epsg, always_xy=True)
    except ProjError as exc:                              # bad EPSG or missing grid
        sys.exit(f"CRS init failed: {exc} — check PROJ_DATA grids and EPSG code")

    lon = df["lon"].to_numpy(np.float64)
    lat = df["lat"].to_numpy(np.float64)
    alt = df.get("alt", pd.Series(np.zeros(len(df)))).to_numpy(np.float64)

    east, north, up = tf.transform(lon, lat, alt)         # one vectorised PROJ call
    out = pd.DataFrame({"easting": east, "northing": north, "elevation": up})

    out["status"] = "PASS"
    out.loc[out[["easting", "northing"]].isna().any(axis=1), "status"] = "FAIL_NAN"
    out.loc[(out["easting"] < 0) | (out["northing"] < 0), "status"] = "FAIL_BOUNDS"

    if {"ref_e", "ref_n"}.issubset(df.columns):           # optional known-point check
        drift = np.hypot(out["easting"] - df["ref_e"].to_numpy(),
                         out["northing"] - df["ref_n"].to_numpy())
        out.loc[(drift > tol_m) & (out["status"] == "PASS"), "status"] = "FAIL_DRIFT"
    return out


if __name__ == "__main__":
    table = pd.read_csv(sys.argv[1])                      # columns: lon, lat, [alt]
    result = to_local_grid(table, int(sys.argv[2]))       # e.g. 32610 for UTM 10N
    result.to_csv(sys.stdout, index=False)

Run it against a coordinate export and inspect the status column:

python wgs84_to_local.py ./gcps_wgs84.csv 32610

Two lines carry the entire accuracy budget. Transformer.from_crs(4326, target_epsg, always_xy=True) builds a deterministic operation with the correct axis order — drop always_xy and you reintroduce the ±500 m swap. Passing all three arrays into a single tf.transform(lon, lat, alt) keeps the vertical component in the same datum-aware hop instead of letting elevation drift through an untracked default. The same discipline, scaled to chunked streaming and concurrent workers, is detailed in coordinate transformation workflows in PyProj.

Edge-case matrix

Real coordinate exports are never clean. The table lists the input variants that break a first draft and the handling the routine above encodes — or the one-line change that adds it.

Input variant	Symptom if unhandled	Expected handling
`(lat, lon)` axis order	Clean ±500 m systematic offset	`always_xy=True` pins `(lon, lat)` input ordering
Missing NTv2/geoid grid	Sub-decimetre datum smear, no error	Set `PROJ_DATA`; verify with `pyproj.datadir.get_data_dir()`
Invalid / wrong EPSG	`ProjError: no operation available`	`try/except ProjError`, exit with the offending code
Point outside zone bounds	`inf`/`NaN` in easting or northing	`FAIL_NAN` status flags the row before it reaches the solver
Wrong UTM hemisphere	Negative or absurd northing	`FAIL_BOUNDS` status catches sign violations
No `alt` column	`KeyError` on elevation	`df.get("alt", ...)` defaults height to zero
10⁶+ row log	OOM kill mid-run	Stream in 25k–50k chunks (parent pipeline)

Verify the fix worked

Before the local-grid table reaches bundle adjustment, assert that nothing failed the gate and — when you have surveyed monuments to check against — that the residual against known control is within tolerance. The horizontal root-mean-square error over the passing subset is

\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\left[(E_i - E_i^{\text{ref}})^2 + (N_i - N_i^{\text{ref}})^2\right]}

import logging

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")

def verify(out: "pd.DataFrame", df: "pd.DataFrame", tol_m: float = 0.02) -> None:
    failed = out.loc[out["status"] != "PASS"]
    assert failed.empty, f"{len(failed)} rows failed: {failed['status'].value_counts().to_dict()}"
    if {"ref_e", "ref_n"}.issubset(df.columns):
        rmse = float(np.sqrt(((out["easting"] - df["ref_e"]) ** 2 +
                              (out["northing"] - df["ref_n"]) ** 2).mean()))
        assert rmse <= tol_m, f"RMSE {rmse:.4f} m exceeds tolerance {tol_m} m"
        logging.info("verified %d coords, RMSE %.4f m within tolerance", len(out), rmse)

A clean run logs the count and RMSE and exits silently; any failed assertion names the failure class so you can pull the source rows instead of chasing a warped orthomosaic later. A 0.02 m horizontal tolerance aligns with typical RTK/PPK control standards — see setting accuracy thresholds for survey projects for choosing project-specific budgets.

Common error messages

ProjError: no operation available between EPSG:4326 and EPSG:xxxx. PROJ could not find a transformation path, almost always because the required grid shift file is absent. Confirm the grid directory with pyproj.datadir.get_data_dir(), install the relevant grids, and re-check that the target EPSG code actually exists.

Every transformed point is offset by a constant ±500 m or more. The longitude and latitude were swapped. Rebuild the transformer with always_xy=True so the (lon, lat) input order is honoured.

Output eastings or northings are inf or NaN. The input coordinates fall outside the valid bounds of the target projection — typically the wrong UTM zone or hemisphere for that longitude. Filter or re-zone before transforming; the FAIL_NAN status isolates these rows automatically.

When to escalate

This single-table routine is deliberately narrow. Move up to the parent workflow when:

The input no longer fits comfortably in memory — multi-million-row RTK logs need the chunked, ParquetWriter-backed streaming pipeline on coordinate transformation workflows in PyProj.
The grid is correct but the mosaic still warps — a clean transform with residuals concentrated in one corner is a residual-distribution problem handled in distributing GCP errors across orthomosaics, not a transformation problem.
You need to certify the network against a stated accuracy budget — outlier policy and tolerance selection belong to setting accuracy thresholds for survey projects.

← Coordinate Transformation Workflows in PyProj

Converting WGS84 to Local Grid with Python

# Why the offset appears

# Minimal reproducible solution

# Edge-case matrix

# Verify the fix worked

# Common error messages

# When to escalate

# Related