How to Read Binary Files in Python (With Examples, struct, mmap, and NumPy)
To read a binary file in Python, open it using open(“file.bin”, “rb”), then use file.read() to access raw bytes or struct.unpack() to decode structured binary values.
Example:
import struct
with open(“data.bin”, “rb”) as f:
value = struct.unpack(“i”, f.read(4))[0] # Little-endian unsigned 32-bit integer
print(value)
What Is a Binary File?
A binary file stores data in raw byte format rather than human-readable text. These files are used for:
- Images (BMP, PNG, JPEG)
- Audio and video files
- Executables
- Compressed archives
- Database dumps
- Network packets
- Custom application data formats
Unlike text files, binary files must be interpreted correctly based on their structure and encoding.
Opening a Binary File in Python
Always use binary read mode ("rb"):
with open("file.bin", "rb") as f:
   data = f.read()
Why use with?
- Ensures proper file closure
- Prevents memory leaks
- Follows Python best practices
Reading Raw Bytes
Read Entire File
with open("file.bin", "rb") as f:
   data = f.read()
print(type(data))Â # <class 'bytes'>
Read in Chunks (Recommended for Large Files)
chunk_size = 4096
with open("large.bin", "rb") as f:
   while chunk := f.read(chunk_size):
       process(chunk)
Reading in chunks:
- Reduces memory usage
- Prevents crashes with large files
- Improves performance in streaming applications
Decoding Structured Binary Data with struct
Binary files often contain structured fields like integers and floats.
Python’s struct module converts raw bytes into Python values.
Common Format Codes
| Format | Type | Size |
| b | Signed char | 1 byte |
| B | Unsigned char | 1 byte |
| h | Signed short | 2 bytes |
| H | Unsigned short | 2 bytes |
| i | Signed int | 4 bytes |
| I | Unsigned int | 4 bytes |
| q | Signed long long | 8 bytes |
| Q | Unsigned long long | 8 bytes |
| f | Float | 4 bytes |
| d | Double | 8 bytes |
Endianness
- < → Little-endian
- > → Big-endian
Example:
import struct
with open("data.bin", "rb") as f:
   number = struct.unpack("<I", f.read(4))[0]
   print(number)
Reading Large Binary Files Efficiently
Option 1: Memory Mapping with mmap
Best for large files requiring random access.
import mmap
with open("large.bin", "rb") as f:
   mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
   header = mm[0:16]
   mm.close()
Advantages:
- No full file loading into memory
- Faster random access
- Lower memory footprint
Option 2: Using memoryview (Avoid Copies)
with open("file.bin", "rb") as f:
   data = f.read()
   view = memoryview(data)
   slie_part = view[0:10]
Reading Binary Numeric Arrays with NumPy
If your file contains large numeric datasets:
import numpy as np
arr = np.fromfile("array.bin", dtype=np.float32)
print(arr[:5])
Benefits:
- Extremely fast
- Ideal for scientific computing
- Memory-efficient
Real-World Example: Reading a BMP Image Header
A BMP file stores width and height at specific byte offsets.
import struct
with open("image.bmp", "rb") as f:
   header = f.read(54)
   width, height = struct.unpack_from("<ii", header, offset=18)
print("Width:", width)
print("Height:", height)
For full image processing, use Pillow:
from PIL import Image
img = Image.open("image.bmp")
img.load()
Common Challenges and How to Fix Them
1. Endianness Issues
Incorrect byte order produces wrong values.
Always confirm whether data is little or big-endian.
2. Partial Reads
If f.read(n) returns fewer than n bytes, you may be at EOF.
Safe pattern:
data = f.read(4)
if len(data) != 4:
   raise EOFError("Unexpected end of file")
3. Memory Errors
Never use f.read() on multi-GB files.
Use chunked reading or mmap.
4. Unknown File Format
Check:
- Official documentation
- File specification
- Hex dump tools (hexdump, xxd)
Best Practices Checklist
- Always open with “rb”
- Use with statements
- Validate byte lengths before unpacking
- Document field structure clearly
- Use chunked reading for large files
- Use mmap for random access
- Use NumPy for homogeneous numeric arrays
- Write unit tests for parsing logic
When Should You Read Binary Files?
Common real-world use cases:
- Parsing custom application data
- Reading hardware sensor logs
- Processing scientific datasets
- Network protocol analysis
- Reverse engineering file formats
- Handling media file metadata
FAQs
Q) How do I read a 16-bit integer from a binary file in Python?
A)import struct
value = struct.unpack(“<H”, f.read(2))[0]
Q) What is endianness in binary files?
A) Endianness determines byte order in multi-byte values.
Little-endian stores least significant byte first.
Big-endian stores most significant byte first.
Q) What is the fastest way to read large binary files?
Use:
Chunked reading for streaming
mmap for random access
numpy.fromfile() for numeric arrays
Q) Can I convert binary data to text?
A) Yes, if the binary file contains encoded text:
text = data.decode("utf-8")
Only do this if the content is actually encoded text.
Q) How do I read a specific byte range from a binary file?
A) Use seek() to move the file pointer and read() the required bytes:
with open("file.bin", "rb") as f:
   f.seek(100)    # Move to byte 100
   data = f.read(20) # Read next 20 bytes
Q) How do I detect the size of a binary file in Python?
A) import os
size = os.path.getsize("file.bin")
print(size)
Or:
with open("file.bin", "rb") as f:
   f.seek(0, 2)
   size = f.tell()
Q) How can I convert binary data into hexadecimal?
A) with open(“file.bin”, “rb”) as f:
 data = f.read()
hex_output = data.hex()
print(hex_output)
Q) How do I parse a binary file with multiple fields?
A) Use sequential unpack() calls or unpack multiple fields at once:
import struct
with open("file.bin", "rb") as f:
   header = struct.unpack("<IHB", f.read(7))
Q) How do I read binary network packets in Python?
A) Use socket to receive raw bytes and unpack() to decode fields:
import socket
import struct
sock = socket.socket(socket.AF_INET, socket.SOCK_RAW)
packet = sock.recv(1024)
version = struct.unpack("!B", packet[0:1])[0]/code>
! indicates network (big-endian) byte order.
Conclusion
Reading binary files in Python requires understanding:
- File structure
- Byte order
- Data types
- Efficient memory usage
By combining open(“rb”), struct, mmap, and NumPy, you can safely parse everything from small configuration files to massive scientific datasets.
With proper validation and performance strategies, Python provides one of the most powerful environments for binary data processing.