Ron Jailall - Resume

Professional Profile

Principal AI Engineer & Architect with 15+ years of experience delivering high-impact ML solutions and distributed systems. Expert in NVIDIA AI ecosystems (TensorRT, Riva, Jetson), Agentic Workflows, and LLM Inference Optimization. Proven track record of taking complex AI features from concept to production, optimizing for both cloud (AWS/GCP) and edge environments. Passionate about bridging the gap between research and real-world application to solve enterprise-scale problems.

Core Competencies

NVIDIA AI Stack: TensorRT, Triton Inference Server, Nvidia Riva, JetPack/DeepStream, CUDA Optimization.

Agentic & GenAI: RAG Pipelines, Multi-Step Reasoning Agents, LangChain concepts, LLM Fine-tuning (PEFT), Vector Databases.

ML Optimization: Quantization (INT8/FP16), KV Caching strategies, ONNX Runtime, MobileNet/EfficientNet architectures.

Engineering Leadership: End-to-End ML Lifecycle (MLOps), CI/CD (GitLab/Terraform), Mentorship, Technical Strategy.

Selected Technical Talks

Hyperfast AI: Rethinking Design for 1000 tokens/s

AI Tinkerers Raleigh, Dec 2025

Presented on Agentic AI design patterns enabled by ultra-low latency inference, focusing on self-correction loops and real-time planning.

Apple's On-Device VLM: The Future of Multimodal AI

Conference Talk, Sep 2025

Deep dive into optimizing Vision Language Models for edge devices, a key parallel to NVIDIA's edge AI strategies.

Professional Experience

ML Engineering Consultant / Technical Lead

2024 – Present

Remote

Delivering "Intelligent AI Solutions" and optimizing ML architectures for diverse enterprise clients.

NVIDIA Ecosystem Optimization & Migration: Migrated Nvidia Riva/Triton microservice-based applications to AWS, re-architecting for high-availability. Optimized Computer Vision models for embedded Nvidia Jetson platforms using TensorRT/ONNX, achieving significant latency reduction. Architected a Jetson-based multi-view camera tracking system for precise real-time tracking.
Agentic AI & High-Performance Inference (Cerebras Project): Developing a high-speed Agentic AI framework leveraging >1000 tokens/s inference to enable complex, multi-step reasoning. Designed interruptible agent loops that handle ambiguity and user correction in real-time.
End-to-End ML Systems Ownership (Matte Model): Architected a CPU-efficient portrait matting system (MobileNetV2/MODNet) using TensorFlow 2, replacing legacy SDKs in a production Electron app. Built full MLOps pipelines on GCP Vertex AI for training, evaluation, and seamless ONNX deployment.

Lead Engineer, AI R&D

2023 – 2024

Vidable.ai | Remote

Spearheaded the design and deployment of Generative AI solutions and RAG pipelines.

RAG & GenAI Development: Designed and implemented real-time RAG-based inference pipelines (Vector DBs, LLMs) to power dynamic, personalized recommendations and search. Evaluated and productized latest GenAI tools (MLFlow, AuroraDB), bridging the gap between PhD research and engineering implementation.
ML Infrastructure & Ops: Built CI/CD pipelines in Terraform/GitLab/AWS to deploy scalable ML inference servers (Docker, Kubernetes). Optimized performance of Python-based API endpoints (FastAPI/Uvicorn) and modified C/C++ code (llama.cpp) for custom production use cases.
Mentorship: Hosted weekly cross-company meetings to share latest AI news and best practices, mentoring engineers on new prompting techniques and model architectures.

Lead Engineer

2014 – 2023

Sonic Foundry | Remote

Led the transition to cloud-native architectures and pioneered early AI adoption.

Enterprise Scale: Architected data pipelines serving the company's 5 largest customers (100k+ users), ensuring high reliability and throughput.
Cloud Architecture: Designed an AWS cloud-native Archive utility (Lambda, Batch) that generated millions in revenue, demonstrating ability to align technical design with business outcomes.
Early Transformer & LLM Work: Developed pipelines to deploy GGML Llama models to AWS Serverless Architecture (C++, C#) for cost-effective inference. Developed prefix-tuning datasets for Nvidia Megatron LLM, gaining early experience with large-scale model adaptation.
Innovation: Founded the company's internal AI reading group (~25% employee participation) and led hackathons to drive AI adoption across the org.

Ways I Stand Out (Projects & Research)

Agentic AI Mastery - Cohere AI Hackathon (3rd Place): Built an "LLM as Learning Assistant" using agentic prompting to parse keywords, search external knowledge bases, and re-phrase results in real-time.
LLM Inference Optimization - Matte Model Project: Demonstrated deep understanding of model optimization by retraining MobileNetV2 backbones for CPU-specific inference targets, managing the trade-off between accuracy and latency.
Research & Continuous Learning: Regularly attend NeurIPS and CVPR workshops on distributed processing, federated learning, and VLMs to stay at the cutting edge of ML research.