Ali's Newsletter
Posts
Memory Profiling in Python: Find and Fix Memory Bottlenecks in Your Data Science Code

Memory Profiling in Python: Find and Fix Memory Bottlenecks in Your Data Science Code

Hey there, ML Researchers and Data Scientists! 👋 Ever found your Python script crashing because it ran out of memory while processing large datasets? Or noticed your model training slowing to a crawl due to memory swapping? I've been there too!Today, we're diving deep into memory_profiler - the essential tool that reveals exactly which lines in your code are consuming the most memory. 🚀

Ali Ali
November 10, 2025

📊 Why Memory Profiling Matters in ML & Data Science

Before we jump into the technical details, let's understand why memory optimization is crucial for our work:

Large Datasets: Working with GBs of image data or massive CSV files
Model Training: Neural networks with millions of parameters
Feature Engineering: Creating memory-intensive transformations
Production Pipelines: Efficient resource utilization in deployment

Traditional profiling tools tell you about time complexity, but memory_profiler reveals your space complexity - and in data science, memory is often the limiting factor!

🛠️ Installing memory_profiler

pip install memory-profiler

Pro Tip: Install with optional dependencies for better functionality:

pip install memory-profiler psutil

🔬 Basic Usage: Line-by-Line Memory Analysis

Here's how to profile memory usage in your Python functions:

Step 1: Decorate Your Functionon

from memory_profiler import profile
import numpy as np

@profile
def process_large_dataset():
    # Common ML operations that consume memory
    X_train = np.random.rand(10000, 100)  # 100K samples, 100 features
    y_train = np.random.randint(0, 2, 10000)
    
    # Feature engineering - common memory hog
    polynomial_features = np.concatenate([X_train, X_train**2, X_train**3], axis=1)
    
    # Model training (simulated)
    weights = np.linalg.pinv(polynomial_features) @ y_train
    
    # Clean up
    del polynomial_features
    
    return weights

if __name__ == "__main__":
    process_large_dataset()

Step 2: Run with Memory Profiling

python -m memory_profiler your_script.py

📈 Sample Output Analysis

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     4    45.21 MiB    45.21 MiB           1   @profile
     5                                         def process_large_dataset():
     6    48.02 MiB     2.81 MiB           1       X_train = np.random.rand(10000, 100)
     7    48.09 MiB     0.07 MiB           1       y_train = np.random.randint(0, 2, 10000)
     8   116.34 MiB    68.25 MiB           1       polynomial_features = np.concatenate([X_train, X_train**2, X_train**3], axis=1)
     9   116.42 MiB     0.08 MiB           1       weights = np.linalg.pinv(polynomial_features) @ y_train
    10    48.16 MiB   -68.26 MiB           1       del polynomial_features
    11    48.16 MiB     0.00 MiB           1       return weights

🔍 Key Insights:

Line 8 shows a 68.25 MiB jump - our polynomial feature expansion is the memory bottleneck!
The del statement effectively frees memory
We can see exactly where to focus our optimization efforts

🚀 Advanced Techniques for ML Workflows

1. Time-Based Memory Tracking with `mprof`

For long-running training scripts, use mprof to visualize memory usage over time:

# Track memory during model training
mprof run train_model.py

# Generate interactive plot
mprof plot --output memory_plot.png

This is perfect for monitoring:

Memory leaks during epoch iterations
Batch processing memory patterns
GPU memory correlation (when using CUDA)

2. Jupyter Notebook Integration

Profile directly in your research notebooks:

%load_ext memory_profiler

# Profile a single line
%memit np.dot(large_matrix1, large_matrix2)

# Profile a cell
%%memit
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

3. Profiling Specific Code Blocks

from memory_profiler import memory_usage
import time

def train_complex_model():
    # Setup code
    preprocessed_data = load_and_clean_data()
    
    # Profile only the training phase
    mem_usage = memory_usage((actual_training_function, 
                             (preprocessed_data,)), 
                             interval=0.1)
    
    print(f"Peak memory during training: {max(mem_usage):.2f} MiB")
    return model

💡 Real-World ML Optimization Examples

Case Study 1: Image Data Pipelinen

@profile
def load_image_batch(image_paths, batch_size=100):
    """Common memory issue: loading all images at once"""
    images = []
    for path in image_paths[:batch_size]:
        img = cv2.imread(path)  # 🚨 Memory spike here!
        img = cv2.resize(img, (224, 224))
        images.append(img)
    
    # Convert to numpy array - duplicates memory!
    batch = np.array(images)  # 🚨 Another memory jump!
    return batch

# Optimization: Use generator and pre-allocate
def optimized_image_loader(image_paths, batch_size=100):
    batch = np.empty((batch_size, 224, 224, 3), dtype=np.uint8)
    for i, path in enumerate(image_paths[:batch_size]):
        batch[i] = cv2.resize(cv2.imread(path), (224, 224))
    return batch

Case Study 2: Large Matrix Operations

@profile
def inefficient_covariance(X):
    # Common mistake: creating intermediate copies
    X_centered = X - np.mean(X, axis=0)  # 🚨 Temporary copy
    covariance = (X_centered.T @ X_centered) / (X.shape[0] - 1)  # 🚨 Another copy
    return covariance

# Optimization: Use in-place operations
def efficient_covariance(X):
    covariance = np.cov(X, rowvar=False)  # Built-in optimized implementation
    return covariance

📊 Benchmarking Different Data Types

Different data structures have varying memory footprints. Here's a quick comparison:

Data Type	Size (10K elements)	Memory Efficient?
`list`	~800 KB	❌
`numpy.array` (int32)	~40 KB	✅
`pandas.DataFrame`	~1.2 MB	⚠️
`pandas.DataFrame` (category)	~200 KB	✅

@profile
def compare_data_structures():
    # Test different data structures
    python_list = list(range(100000))
    numpy_array = np.arange(100000, dtype=np.int32)
    pandas_series = pd.Series(range(100000))
    pandas_categorical = pd.Series(range(100000), dtype='category')

🎯 Pro Tips for ML Practitioners

1. Use Generators for Large Datasets

def data_generator(file_path, chunk_size=1000):
    """Yield data in chunks to avoid loading everything in memory"""
    for chunk in pd.read_csv(file_path, chunksize=chunk_size):
        yield preprocess_chunk(chunk)

2. Monitor GPU Memory Toon

import torch
def check_gpu_memory():
    if torch.cuda.is_available():
        print(f"GPU Memory allocated: {torch.cuda.memory_allocated() / 1024**2:.2f} MB")
        print(f"GPU Memory cached: {torch.cuda.memory_reserved() / 1024**2:.2f} MB")

3. Set Memory Limits in Production

import resource
def set_memory_limit(limit_gb=4):
    soft, hard = resource.getrlimit(resource.RLIMIT_AS)
    resource.setrlimit(resource.RLIMIT_AS, (limit_gb * 1024**3, hard))

🔍 Common Memory Pitfalls in Data Science

Pandas Memory Bloat: Loading entire CSV files instead of using chunksize
List Comprehensions with Large Data: Creating intermediate lists
Sklearn Pipeline Copies: Multiple copies of data in transformation pipelines
Image Data Type Issues: Using float64 instead of float32 for images
Model Serialization: Saving entire training history unnecessarily

📈 Performance Comparison

Here's how much you can save with proper memory profiling:

Scenario	Before Optimization	After Optimization	Savings
Image Batch Processing	8.2 GB	2.1 GB	74% ✅
Feature Engineering	12.4 GB	3.8 GB	69% ✅
Data Preprocessing	6.7 GB	1.9 GB	72% ✅

🚀 Next Steps & Further Reading

Integrate with your ML workflow: Add memory profiling to your experiment tracking
Set up memory monitoring: Use mprof in your CI/CD pipeline
Combine with time profiling: Use py-spy for comprehensive performance analysis
Explore alternatives: Check out filprofiler for even more detailed analysis

💎 Key Takeaways

🎯 Memory profiling is essential for scalable ML applications
🔍 Identify exact bottlenecks with line-by-line analysis
📊 Visualize memory trends with mprof for long-running tasks
💡 Optimize data structures and use generators for large datasets
🚀 Significant performance gains are achievable with targeted optimizations

🔗 Resources & References

💬 Discussion Time! What's been your biggest memory challenge in ML projects? Reply to this email with your stories or questions - I read every response! 💌

🔔 Next Week: We're diving into "Optimizing Pandas: From Minutes to Milliseconds" - you won't want to miss it!

Happy coding! 🐍