Ron Jailall

Principal AI Engineer & Architect

Raleigh, NC | (608) 332-8605

rojailal@gmail.com

https://ironj.github.io/

Professional Profile

Principal AI Engineer & Architect with 15+ years of experience delivering high-impact ML solutions and distributed systems. Expert in NVIDIA AI ecosystems (TensorRT, Riva, Jetson), Agentic Workflows, and LLM Inference Optimization. Proven track record of taking complex AI features from concept to production, optimizing for both cloud (AWS/GCP) and edge environments. Passionate about bridging the gap between research and real-world application to solve enterprise-scale problems.

Core Competencies

NVIDIA AI Stack: TensorRT, Triton Inference Server, Nvidia Riva, JetPack/DeepStream, CUDA Optimization.
Agentic & GenAI: RAG Pipelines, Multi-Step Reasoning Agents, LangChain concepts, LLM Fine-tuning (PEFT), Vector Databases.
ML Optimization: Quantization (INT8/FP16), KV Caching strategies, ONNX Runtime, MobileNet/EfficientNet architectures.
Engineering Leadership: End-to-End ML Lifecycle (MLOps), CI/CD (GitLab/Terraform), Mentorship, Technical Strategy.

Selected Technical Talks

Hyperfast AI: Rethinking Design for 1000 tokens/s

AI Tinkerers Raleigh, Dec 2025
  • Presented on Agentic AI design patterns enabled by ultra-low latency inference, focusing on self-correction loops and real-time planning.

Apple's On-Device VLM: The Future of Multimodal AI

Conference Talk, Sep 2025
  • Deep dive into optimizing Vision Language Models for edge devices, a key parallel to NVIDIA's edge AI strategies.

Professional Experience

ML Engineering Consultant / Technical Lead

2024 – Present
Remote

Delivering "Intelligent AI Solutions" and optimizing ML architectures for diverse enterprise clients.

  • NVIDIA Ecosystem Optimization & Migration: Migrated Nvidia Riva/Triton microservice-based applications to AWS, re-architecting for high-availability. Optimized Computer Vision models for embedded Nvidia Jetson platforms using TensorRT/ONNX, achieving significant latency reduction. Architected a Jetson-based multi-view camera tracking system for precise real-time tracking.
  • Agentic AI & High-Performance Inference (Cerebras Project): Developing a high-speed Agentic AI framework leveraging >1000 tokens/s inference to enable complex, multi-step reasoning. Designed interruptible agent loops that handle ambiguity and user correction in real-time.
  • End-to-End ML Systems Ownership (Matte Model): Architected a CPU-efficient portrait matting system (MobileNetV2/MODNet) using TensorFlow 2, replacing legacy SDKs in a production Electron app. Built full MLOps pipelines on GCP Vertex AI for training, evaluation, and seamless ONNX deployment.

Lead Engineer, AI R&D

2023 – 2024
Vidable.ai | Remote

Spearheaded the design and deployment of Generative AI solutions and RAG pipelines.

  • RAG & GenAI Development: Designed and implemented real-time RAG-based inference pipelines (Vector DBs, LLMs) to power dynamic, personalized recommendations and search. Evaluated and productized latest GenAI tools (MLFlow, AuroraDB), bridging the gap between PhD research and engineering implementation.
  • ML Infrastructure & Ops: Built CI/CD pipelines in Terraform/GitLab/AWS to deploy scalable ML inference servers (Docker, Kubernetes). Optimized performance of Python-based API endpoints (FastAPI/Uvicorn) and modified C/C++ code (llama.cpp) for custom production use cases.
  • Mentorship: Hosted weekly cross-company meetings to share latest AI news and best practices, mentoring engineers on new prompting techniques and model architectures.

Lead Engineer

2014 – 2023
Sonic Foundry | Remote

Led the transition to cloud-native architectures and pioneered early AI adoption.

  • Enterprise Scale: Architected data pipelines serving the company's 5 largest customers (100k+ users), ensuring high reliability and throughput.
  • Cloud Architecture: Designed an AWS cloud-native Archive utility (Lambda, Batch) that generated millions in revenue, demonstrating ability to align technical design with business outcomes.
  • Early Transformer & LLM Work: Developed pipelines to deploy GGML Llama models to AWS Serverless Architecture (C++, C#) for cost-effective inference. Developed prefix-tuning datasets for Nvidia Megatron LLM, gaining early experience with large-scale model adaptation.
  • Innovation: Founded the company's internal AI reading group (~25% employee participation) and led hackathons to drive AI adoption across the org.

Ways I Stand Out (Projects & Research)

Education & Certifications

NC State University | Electrical & Computer Engineering (75 Credit Hours) | 15+ Years Equivalent Industry Experience
Coursera Verified Certificates:
  • Neural Networks for Machine Learning (Hinton) | ID: 3MJACUGZ4LMA
  • Image and Video Processing | ID: E9JX646TTS