Research Assistant
- Efficient Deep Learning: Designs and analyzes post-training quantization strategies for CNN–Transformer hybrid and Vision Transformer models, with emphasis on deployment efficiency and latency–accuracy trade-offs on edge devices.
- LLM Systems: Designs and evaluates retrieval-augmented generation pipelines using quantized embedding and generation models, leveraging efficient serving frameworks (e.g., vLLM-style execution) and optimized attention mechanisms to study inference cost and latency trade-offs for domain-specific applications.
- Inference Optimization: Implements and evaluates inference-level optimizations for deep learning and LLM-based systems, including runtime selection, attention optimization, and deployment-aware execution strategies.
- Survey & Technical Synthesis: Conducts large-scale literature synthesis on Vision Transformers and edge acceleration, integrating insights from model compression, system-level optimization, and hardware-aware design.
- Evaluation & Reproducibility: Develops reproducible profiling, and ablation pipelines to support deployment-aware analysis across models, datasets, and hardware platforms.