My work spans AI storage optimization, distributed systems, and high-throughput data processing systems. Below are some key areas/projects I have been involved with.

Multi-Protocol Storage for ML Workloads

Building and supporting storage systems that support both NFS (for legacy HPC workflows) and S3 (for cloud-native pipelines) over the same dataset. Key challenges include:

  • Protocol semantic differences (POSIX vs object storage)
  • Consistency models for concurrent access patterns
  • Performance optimization for sequential reads (training) vs random access (checkpointing/Inference)

Tier-0 Caching Strategies for GPU Clusters

Researching node-local caching architectures to minimize data transfer over network fabric:

  • Pre-fetching strategies based on training iteration patterns
  • Cache eviction policies
  • RDMA integration for sub-microsecond latency data transfers
  • Benchmarking with PyTorch and TensorFlow data pipelines

Storage Benchmarking for Transformer Training

Profiling storage I/O patterns for large language model training:

  • Characterizing data loading bottlenecks in transformer architectures
  • Tools: FIO, elbencho, custom PyTorch profilers
  • Building Load Testing Frameworks on top of Elbencho to test File and Object storage.

High-Throughput Streaming Data Ingestion and Delivery

Scaling data ingestion pipeline to billions of events and delivery across thousands of Partitions(s3-prefixes):

  • Dynamic Partitioning redesign: Akka Streams architecture achieving 400% throughput improvement
  • Backpressure handling and flow control in reactive systems
  • Lease management for distributed coordination
  • SQS/DynamoDB for workload distribution and state management

Performance Engineering

Approaches to performance optimization:

  • Profiling Java services for memory leaks and GC tuning
  • Native memory analysis, exploring various unix memory allocators(malloc vs JEMalloc) and heap dump investigation
  • Identifying hot paths and optimizing critical sections
  • Monitoring and observability for production systems

Research Interests

Current areas of exploration and experimentation:

  • RDMA-enabled storage: Low-latency data access for AI workloads
  • Caching hierarchies: Multi-tier caching strategies (node-local, rack-level, datacenter)
  • Storage cost optimization: Balancing performance and cost for training workloads
  • Benchmarking methodologies: Standardized approaches for storage performance evaluation

Technical Writing & Speaking

As an aspiring technical author, I’m interested in writing about:

  • Storage architecture for AI/ML infrastructure
  • Distributed systems design patterns
  • Performance optimization techniques
  • Real-world case studies from production systems

Interested in collaboration, co-authorship, or speaking opportunities? Reach out.