Ron Jailall

AI Engineering Lead & Architect

Raleigh, NC | (608) 332-8605

rojailal@gmail.com

https://ironj.github.io/

Professional Profile

AI Engineering Lead & Architect with 15+ years of experience building and deploying deep learning solutions. Expert in Transformers and LLM architectures, with strong proficiency in both PyTorch and TensorFlow. Proven ability to own complex problems end-to-end—from data engineering and model fine-tuning to scalable production deployment. Thrives in fast-paced, ambiguous environments requiring rapid prototyping and cross-functional collaboration.

Technical Skills

Deep Learning & Transformers: LLMs (Llama, Megatron), Vision Transformers (ViT), BERT, Diffusion Models, Policy Optimization, Distillation.
Frameworks: PyTorch, TensorFlow 2/Keras, JAX (basics), ONNX, TensorRT.
Search & Relevance: RAG Pipelines, Vector Embeddings, Neural Search, Recommendation Systems, Semantic Retrieval.
Engineering Principles: Data Structures, Algorithms, CI/CD (GitLab/Terraform), Distributed Systems (AWS/GCP), Python, C++, C#.

Professional Experience

ML Engineering Consultant / Technical Lead

2024 – Present
Remote

Driving end-to-end AI product development, focusing on Search, RAG, and Computer Vision.

  • Search Relevance & Recommendation Systems: Designed and implemented real-time RAG-based inference pipelines to power dynamic, personalized product recommendations, directly optimizing for relevance and user engagement. Developed production-grade embedding model servers powering real-time search and recommendation web apps, ensuring low-latency retrieval.
  • End-to-End Deep Learning (Matte Model Project): Took full ownership of a custom human matting solution, architecting a MobileNetV2-based model in TensorFlow 2/Keras to replace legacy SDKs. Managed the entire lifecycle: dataset curation (P3M-10k), custom data augmentation pipelines, training, and ONNX export for cross-platform deployment.
  • Transformers & Optimization: Migrated Nvidia Riva/Triton microservices to AWS, optimizing transformer-based ASR/NLP models for scalability. Accelerated Stable Diffusion models for real-time generation, implementing latency optimizations for production video pipelines.

Lead Engineer, AI R&D

2023 – 2024
Vidable.ai | Remote

Led a cross-functional RAG team in a fast-paced startup environment, evaluating and deploying LLMs.

  • LLM Fine-Tuning & Application: Collaborated with researchers to evaluate new prompting techniques and model architectures (Llama, Mistral) for customer-specific use cases. Modified C/C++ inference engines (llama.cpp) to support specific quantization and performance requirements for production APIs.
  • Rapid Prototyping: Built React-based demos powered by LLMs and diffusion models to quickly validate product concepts with stakeholders. Created CI/CD pipelines (Terraform/AWS) to deploy ML inference servers, enabling the team to move fast from experiment to deployed endpoint.

Lead Engineer

2014 – 2023
Sonic Foundry | Remote

Progressed from Engineer to Lead, introducing Deep Learning and Neural Search to the organization.

  • Search & NLP: Prototyped neural search of video archives using segmentation and classification techniques (PyTorch, TensorFlow), significantly improving content discoverability. Created a C#/.NET utility to build sentence corpuses using TF-IDF and stemming, preparing data for IBM Watson speech recognition training.
  • Model Training & Fine-Tuning: Developed a prefix-tuning dataset for the Nvidia Megatron LLM, gaining early experience in parameter-efficient fine-tuning (PEFT) methods. Utilized AWS Sagemaker and PyTorch to build and modify U-Nets for image analysis and segmentation tasks.
  • System Ownership: Designed and built an AWS cloud-native Archive utility (Lambda, Batch) that became a core revenue-generating product, demonstrating ability to solve ambiguous problems end-to-end.

Selected Technical Talks & Research

Hyperfast AI: Rethinking Design for 1000 tokens/s

AI Tinkerers Raleigh, Dec 2025
  • Discussed the architectural shift required when transformer inference becomes instantaneous, focusing on agentic loops and search capabilities.

Apple's On-Device VLM: The Future of Multimodal AI

Conference Talk, Sep 2025
  • Technical deep dive into optimizing Vision Language Models (Transformers) for resource-constrained environments.

Education & Certifications

NC State University | Electrical & Computer Engineering (Completed 75 Credit Hours)
Coursera Verified Certificates:
  • Neural Networks for Machine Learning (Hinton) | ID: 3MJACUGZ4LMA
  • Image and Video Processing | ID: E9JX646TTS