Memory Management for Large Point Clouds

When a drone survey grows from a single test flight into a corridor or regional dataset, the dense point cloud stops fitting in RAM and naive Open3D or numpy pipelines begin to thrash, swap, or trigger kernel out-of-memory kills mid-reconstruction. This page shows surveying technicians and Python GIS developers how to process UAV-derived point clouds that are far larger than physical memory by streaming LAS/LAZ files in bounded chunks, validating the coordinate reference system at the I/O boundary, and applying memory backpressure so multi-day batch runs finish without crashing.

Audience prerequisites. You should be comfortable with Python 3.10+, virtual environments, and basic point-cloud concepts (XYZ, RGB, intensity, classification). The workflow here assumes a workstation with at least 16 GB RAM and an NVMe or SSD scratch disk — out-of-core streaming trades RAM for disk bandwidth, so spinning media will bottleneck throughput. This workflow is the memory-orchestration stage referenced by Automated Image Alignment & Feature Matching Workflows; it consumes the dense cloud that bundle adjustment and densification produce upstream.

Prerequisites

Install the following into a clean virtual environment. Versions are the minimums validated for the streaming and VLR-parsing behavior used below.

Library	Version	Install command
`laspy`	≥ 2.5	`pip install "laspy>=2.5"`
`lazrs` (LAZ backend)	≥ 0.5	`pip install "laspy[lazrs]"`
`pyproj`	≥ 3.6	`pip install "pyproj>=3.6"`
`numpy`	≥ 1.26	`pip install "numpy>=1.26"`
`psutil`	≥ 5.9	`pip install "psutil>=5.9"`

laspy 2.x ships a streaming reader (chunk_iterator) and an appending writer that are the foundation of every snippet on this page. The lazrs extra is required to read and write compressed .laz; without it laspy can only handle uncompressed .las.

Conceptual architecture

The core idea is that a point cloud never has to exist in memory in its entirety. Instead of laspy.read() (which materializes every point at once), you open the file as a reader, pull a fixed number of points per iteration, transform and filter that slice, append the survivors to an output file, and discard the slice before the next one arrives. Peak memory is then a function of chunk_size, not of total file size — a 12 GB cloud and a 1.2 TB cloud both process inside the same bounded working set.

Three guarantees must hold for this to be safe in production: the CRS is read once from the header and never silently reprojected mid-stream, every chunk is independently error-contained so one bad block cannot abort the survey, and a memory monitor applies backpressure before the OS starts swapping. The structuring conventions from structuring drone imagery for batch processing determine how flight blocks map to files here, and the datum handling mirrors coordinate transformation workflows in pyproj.

1. Estimate the memory budget before you load anything

Before choosing a chunk size, compute how much memory the full cloud would consume. A dense cloud from a 500-image survey easily exceeds 200 million points; at 48–64 bytes per point (XYZ, RGB, intensity, classification, optional normals) that is roughly 9.6–12.8 GB of contiguous array data before indexing or temporary buffers. This estimate tells you immediately whether streaming is mandatory.

def estimate_point_cloud_bytes(num_points: int, bytes_per_point: int = 56) -> float:
    """Return the approximate in-memory size (GiB) of a fully materialized cloud.

    bytes_per_point defaults to 56: XYZ (3x8) + RGB (3x2) + intensity (2) +
    classification (1) rounded up for struct padding. Transformations that copy
    arrays can momentarily double this, so size your headroom accordingly.
    """
    return (num_points * bytes_per_point) / (1024 ** 3)


if __name__ == "__main__":
    for n in (50_000_000, 200_000_000, 800_000_000):
        print(f"{n:>13,} points -> ~{estimate_point_cloud_bytes(n):6.1f} GiB resident")

If the resident estimate approaches half your physical RAM, switch to the streaming pipeline below rather than trying to load and decimate in one pass.

2. Extract and validate the CRS from LAS/LAZ VLRs

Coordinate reference system mismatches are a silent cause of memory bloat: loading points without an explicit CRS lets downstream operations trigger implicit reprojections that duplicate coordinate arrays and inflate the footprint 2–3×. In laspy 2.x the CRS is not exposed as a single parse_crs() method on the header — it lives in Variable Length Records (VLRs). The portable source is the OGC WKT VLR (user_id="LASF_Projection", record_id=2112), which you parse with pyproj.CRS.from_wkt(). The broader rules for this live in managing coordinate reference systems in GDAL.

import logging
from typing import Optional

import laspy
from pyproj import CRS


def extract_crs_from_header(header: laspy.LasHeader) -> Optional[CRS]:
    """Extract a CRS from a LAS/LAZ header by scanning VLRs for the WKT record.

    Returns None when no recognizable projection VLR is present, leaving the
    caller to decide whether to proceed without reprojection or to abort.
    """
    for vlr in header.vlrs:
        if vlr.user_id == "LASF_Projection" and vlr.record_id == 2112:
            try:
                wkt = vlr.record_data.decode("utf-8").rstrip("\x00")
                return CRS.from_wkt(wkt)
            except Exception as exc:  # malformed or truncated WKT
                logging.warning("Failed to parse WKT VLR: %s", exc)
    return None

Validate the CRS once, at the I/O boundary. If the tie-point network was solved under a specific projection, forcing a late-stage datum shift introduces geometric distortion and needless memory duplication — transform at ingest or not at all.

3. Process points in bounded chunks with memory backpressure

Each chunk is filtered (and optionally reprojected) in isolation, and the function is wrapped so a single malformed block logs and raises without corrupting the output already on disk. The memory monitor reads system utilization and forces garbage collection before the OS reaches for swap.

import gc
import logging
from typing import Optional

import numpy as np
import psutil
from pyproj import Transformer


def memory_usage_percent() -> float:
    """Current system memory utilization as a percentage (0–100)."""
    return psutil.virtual_memory().percent


def apply_backpressure(threshold: float) -> None:
    """Collect garbage if memory is high; abort cleanly if it stays high."""
    if memory_usage_percent() > threshold:
        logging.warning("Memory at %.1f%%; collecting garbage.", memory_usage_percent())
        gc.collect()
        if memory_usage_percent() > threshold:
            raise MemoryError("Memory threshold breached; aborting before OOM kill.")


def process_chunk(points, transformer: Optional[Transformer] = None):
    """Filter (and optionally reproject) one chunk; return the surviving points."""
    if transformer is not None:
        # Transformer.transform returns a tuple of arrays, never a 2D array.
        points.x, points.y, points.z = transformer.transform(
            points.x, points.y, points.z
        )
    # Example filter: drop the lowest 2% of Z values (vegetation/noise floor).
    z_min = np.percentile(points.z, 2)
    return points[points.z >= z_min]

Replace the percentile filter with your own classification, statistical-outlier, or decimation step — the contract is simply “points in, fewer or transformed points out,” with no reference held to the previous chunk.

4. Stream the file out-of-core and serialize incrementally

The driver ties the pieces together: open the reader, read the CRS once, open an appending writer seeded with the source header, then iterate. Each processed chunk is flushed with append_points so peak memory stays bounded by chunk_size regardless of total file size. LAZ compression on write reclaims 60–80% of the disk footprint and lowers I/O pressure on the next read cycle.

import logging
from pathlib import Path

import laspy

logging.basicConfig(
    level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s"
)


def stream_point_cloud(
    input_path: Path,
    output_path: Path,
    chunk_size: int = 750_000,
    memory_threshold: float = 82.0,
) -> int:
    """Stream a LAS/LAZ file in chunks, validate CRS, filter, and write out.

    Returns the number of points written. Peak memory is governed by chunk_size,
    so the same call handles a 12 GiB or a 1.2 TiB cloud unchanged.
    """
    if not input_path.exists():
        raise FileNotFoundError(f"Input file not found: {input_path}")

    written = 0
    with laspy.open(input_path) as reader:
        header = reader.header
        source_crs = extract_crs_from_header(header)
        if source_crs is None:
            logging.warning("No CRS VLR found; proceeding without reprojection.")
        else:
            logging.info("Validated CRS: %s", source_crs.name)

        with laspy.open(output_path, mode="w", header=header) as writer:
            for chunk in reader.chunk_iterator(chunk_size):
                apply_backpressure(memory_threshold)
                survivors = process_chunk(chunk)
                if len(survivors):
                    writer.append_points(survivors)
                    written += len(survivors)
                logging.info("Flushed chunk; running total %s points.", f"{written:,}")

    logging.info("Wrote %s points to %s", f"{written:,}", output_path)
    return written


if __name__ == "__main__":
    stream_point_cloud(
        Path("survey_block_01.laz"),
        Path("survey_block_01_processed.laz"),
        chunk_size=750_000,
        memory_threshold=82.0,
    )

For corridor-scale jobs that span many flight blocks, drive stream_point_cloud from the orchestration layer in parallel processing strategies for alignment, assigning one block per worker so RAM is bounded per process rather than across the whole survey.

Parameter deep-dive

Parameter	Type	Default	Valid range	Effect on quality vs. performance
`chunk_size`	int	`750_000`	100k–5M	Larger chunks raise throughput but linearly raise peak RAM (~45–60 MB per 750k points of raw arrays). Tune so one chunk plus a copy fits comfortably in free RAM.
`memory_threshold`	float	`82.0`	60–95	Backpressure trigger as % system memory. Lower is safer (earlier GC, fewer OOM kills) but pauses more often; above ~90 you risk swapping before GC runs.
`bytes_per_point`	int	`56`	32–96	Only affects the budgeting estimate. Raise it when carrying normals or extra dimensions so headroom math stays honest.
Z-filter percentile	float	`2`	0–10	Higher values strip more low noise but can erase real ground returns in steep terrain — verify against known control.
Overlap buffer (spatial chunking)	float (m)	`1.0`	0.5–3.0	When partitioning spatially rather than by index, an overlap prevents edge artifacts during filtering/classification at the cost of reprocessing boundary points.
LAZ write backend	str	`lazrs`	`lazrs`/`laszip`	Compression engine; `lazrs` is pure-Rust and parallel-friendly, `laszip` matches legacy toolchains bit-for-bit.

Verification and output inspection

Never trust a long batch run silently. After writing, assert that the point count is plausible, the output is readable, and — critically — the CRS survived the round trip unchanged. A reprojection that fired without your knowledge will show up as a CRS mismatch or an implausible bounding box.

import laspy

from pyproj import CRS


def verify_output(input_path, output_path) -> None:
    with laspy.open(input_path) as src, laspy.open(output_path) as dst:
        src_header, dst_header = src.header, dst.header

        # 1. The filter removes points, so output must be smaller but non-empty.
        assert 0 < dst_header.point_count <= src_header.point_count, (
            f"Implausible count: {dst_header.point_count} vs {src_header.point_count}"
        )

        # 2. The CRS must be byte-for-byte identical (no silent reprojection).
        src_crs = extract_crs_from_header(src_header)
        dst_crs = extract_crs_from_header(dst_header)
        assert src_crs == dst_crs, f"CRS drifted: {src_crs} -> {dst_crs}"

        # 3. The output bounding box must lie inside the input envelope.
        assert dst_header.mins[0] >= src_header.mins[0] - 1e-6
        assert dst_header.maxs[0] <= src_header.maxs[0] + 1e-6

    print("Verification passed: count, CRS, and bounds are consistent.")

Wire this assertion block into CI or a post-run hook so a regression in chunking or VLR handling fails loudly instead of producing a quietly corrupt deliverable.

Troubleshooting

The process is killed with no Python traceback (exit code 137)

Exit code 137 is the Linux OOM killer terminating the process — the kernel reclaimed memory before Python could raise MemoryError. Lower memory_threshold (try 75) and reduce chunk_size so each iteration’s working set is smaller. Confirm you are not accidentally holding references to previous chunks (e.g. appending them to a list); each chunk must be dropped before the next.

`laspy` raises "lazrs is not installed" when reading a .laz file

The LAZ codec is an optional dependency. Install the backend extra with pip install "laspy[lazrs]" (or laspy[laszip]). Uncompressed .las works without it, but every .laz read or write needs a codec.

`extract_crs_from_header` returns None on a file I know is georeferenced

The file likely stores its CRS in the legacy GeoTIFF key VLRs (record ids 34735–34737) rather than the OGC WKT VLR (2112). Older exporters and some lidar tools do this. Fall back to parsing the GeoKey directory, or run the file through a converter that writes the WKT VLR, before treating “no CRS” as fatal.

Coordinates shifted by tens of meters after processing

A reprojection ran when none was intended, usually because a Transformer was passed into process_chunk. Confirm transformer is None for same-CRS workflows. If you do need a datum shift, build the transformer with always_xy=True and verify direction against a surveyed control point — see coordinate transformation workflows in pyproj.

Throughput is far lower than expected despite low memory use

Out-of-core streaming is disk-bound. Writing intermediate .laz to a spinning disk or a network share will dominate runtime. Move the scratch path to local NVMe, and raise chunk_size (memory permitting) to amortize per-chunk overhead across more points.

Output file is empty even though the run reported "success"

Your per-chunk filter discarded every point. The Z-percentile example removes points below the 2nd percentile, which on flat terrain or a single-elevation surface can match nearly all returns. Loosen or replace the filter, and keep the assert dst_header.point_count > 0 check from the verification step so this fails loudly.

← Automated Image Alignment & Feature Matching Workflows

Memory Management for Large Point Clouds

# Prerequisites

# Conceptual architecture

# 1. Estimate the memory budget before you load anything

# 2. Extract and validate the CRS from LAS/LAZ VLRs

# 3. Process points in bounded chunks with memory backpressure

# 4. Stream the file out-of-core and serialize incrementally

# Parameter deep-dive

# Verification and output inspection

# Troubleshooting

# Related