custom numpy dtypes

Published 2025-05-08 • Updated 2025-05-09

Use custom dtypes to convert tabular binary data without converting to lists or objects:

gene_dtype = np.dtype([
    ('gene_id', 'U10'),
    ('expr', 'f4'),
    ('conf', 'f4')
])
data = np.array([
    ('a', 5.34, 0.97),
    ('b', 1.2, 0.96),
    ('c', 5.1, 0.95),
    ('d', 10.2, 0.91)
], dtype=gene_dtype)

You still get zero-copy slicing and can use np.memmap for large files. Generally, use a plain ndarray when working with homogeneous data; use custom dtypes with heterogeneous data that needs to map cleanly to binary files or C structs; use pandas for higher-level analytics.