How to Read Binary File in Python?

How to Read Binary Files in Python (With Examples, struct, mmap, and NumPy)

To read a binary file in Python, open it using open(“file.bin”, “rb”), then use file.read() to access raw bytes or struct.unpack() to decode structured binary values.

Example:
import struct

with open(“data.bin”, “rb”) as f:
value = struct.unpack(“i”, f.read(4))[0] # Little-endian unsigned 32-bit integer
print(value)

What Is a Binary File?

A binary file stores data in raw byte format rather than human-readable text. These files are used for:

  • Images (BMP, PNG, JPEG)
  • Audio and video files
  • Executables
  • Compressed archives
  • Database dumps
  • Network packets
  • Custom application data formats

Unlike text files, binary files must be interpreted correctly based on their structure and encoding.

Opening a Binary File in Python

Always use binary read mode ("rb"):
with open("file.bin", "rb") as f:
   data = f.read()

Why use with?

  • Ensures proper file closure
  • Prevents memory leaks
  • Follows Python best practices

Reading Raw Bytes

Read Entire File

with open("file.bin", "rb") as f:
   data = f.read()
print(type(data))  # <class 'bytes'>

Read in Chunks (Recommended for Large Files)

chunk_size = 4096
with open("large.bin", "rb") as f:
   while chunk := f.read(chunk_size):
       process(chunk)

Reading in chunks:

  • Reduces memory usage
  • Prevents crashes with large files
  • Improves performance in streaming applications

Decoding Structured Binary Data with struct

Binary files often contain structured fields like integers and floats.

Python’s struct module converts raw bytes into Python values.

Common Format Codes

Format Type Size
b Signed char 1 byte
B Unsigned char 1 byte
h Signed short 2 bytes
H Unsigned short 2 bytes
i Signed int 4 bytes
I Unsigned int 4 bytes
q Signed long long 8 bytes
Q Unsigned long long 8 bytes
f Float 4 bytes
d Double 8 bytes

Endianness

  • < → Little-endian
  • > → Big-endian

Example:

import struct
with open("data.bin", "rb") as f:
   number = struct.unpack("<I", f.read(4))[0]
   print(number)

Reading Large Binary Files Efficiently

Option 1: Memory Mapping with mmap

Best for large files requiring random access.

import mmap
with open("large.bin", "rb") as f:
   mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
   header = mm[0:16]
   mm.close()

Advantages:

  • No full file loading into memory
  • Faster random access
  • Lower memory footprint

Option 2: Using memoryview (Avoid Copies)

with open("file.bin", "rb") as f:
   data = f.read()
   view = memoryview(data)
   slie_part = view[0:10]

Reading Binary Numeric Arrays with NumPy

If your file contains large numeric datasets:

import numpy as np
arr = np.fromfile("array.bin", dtype=np.float32)
print(arr[:5])

Benefits:

  • Extremely fast
  • Ideal for scientific computing
  • Memory-efficient

Real-World Example: Reading a BMP Image Header

A BMP file stores width and height at specific byte offsets.

import struct
with open("image.bmp", "rb") as f:
   header = f.read(54)
   width, height = struct.unpack_from("<ii", header, offset=18)
print("Width:", width)
print("Height:", height)
For full image processing, use Pillow:
from PIL import Image
img = Image.open("image.bmp")
img.load()

Common Challenges and How to Fix Them

1. Endianness Issues

Incorrect byte order produces wrong values.
Always confirm whether data is little or big-endian.

2. Partial Reads

If f.read(n) returns fewer than n bytes, you may be at EOF.

Safe pattern:
data = f.read(4)
if len(data) != 4:
   raise EOFError("Unexpected end of file")

3. Memory Errors

Never use f.read() on multi-GB files.
Use chunked reading or mmap.

4. Unknown File Format

Check:

  • Official documentation
  • File specification
  • Hex dump tools (hexdump, xxd)

Best Practices Checklist

  • Always open with “rb”
  • Use with statements
  • Validate byte lengths before unpacking
  • Document field structure clearly
  • Use chunked reading for large files
  • Use mmap for random access
  • Use NumPy for homogeneous numeric arrays
  • Write unit tests for parsing logic

When Should You Read Binary Files?

Common real-world use cases:

  • Parsing custom application data
  • Reading hardware sensor logs
  • Processing scientific datasets
  • Network protocol analysis
  • Reverse engineering file formats
  • Handling media file metadata

FAQs

Q) How do I read a 16-bit integer from a binary file in Python?

A)import struct

value = struct.unpack(“<H”, f.read(2))[0]

Q) What is endianness in binary files?

A) Endianness determines byte order in multi-byte values.
Little-endian stores least significant byte first.
Big-endian stores most significant byte first.

Q) What is the fastest way to read large binary files?

Use:

Chunked reading for streaming
mmap for random access
numpy.fromfile() for numeric arrays

Q) Can I convert binary data to text?

A) Yes, if the binary file contains encoded text:

text = data.decode("utf-8")

Only do this if the content is actually encoded text.

Q) How do I read a specific byte range from a binary file?

A) Use seek() to move the file pointer and read() the required bytes:

with open("file.bin", "rb") as f:
   f.seek(100)        # Move to byte 100
   data = f.read(20)  # Read next 20 bytes

Q) How do I detect the size of a binary file in Python?

A) import os

size = os.path.getsize("file.bin")
print(size)
Or:
with open("file.bin", "rb") as f:
   f.seek(0, 2)
   size = f.tell()

Q) How can I convert binary data into hexadecimal?

A) with open(“file.bin”, “rb”) as f:

 data = f.read()
hex_output = data.hex()
print(hex_output)

Q) How do I parse a binary file with multiple fields?

A) Use sequential unpack() calls or unpack multiple fields at once:

import struct
with open("file.bin", "rb") as f:
   header = struct.unpack("<IHB", f.read(7))

Q) How do I read binary network packets in Python?

A) Use socket to receive raw bytes and unpack() to decode fields:

import socket
import struct
sock = socket.socket(socket.AF_INET, socket.SOCK_RAW)
packet = sock.recv(1024)
version = struct.unpack("!B", packet[0:1])[0]/code>

! indicates network (big-endian) byte order.

Conclusion

Reading binary files in Python requires understanding:

  • File structure
  • Byte order
  • Data types
  • Efficient memory usage

By combining open(“rb”), struct, mmap, and NumPy, you can safely parse everything from small configuration files to massive scientific datasets.

With proper validation and performance strategies, Python provides one of the most powerful environments for binary data processing.