Ron Jailall - Resume

Professional Profile

Software Engineering Manager & AI Architect with over 15 years of experience leading distributed engineering teams and delivering scalable, cloud-native AI solutions. Expert in bridging the gap between research and production, with deep hands-on experience in Generative AI, LLM infrastructure, and Computer Vision. Proven track record of architecting high-throughput data pipelines on AWS/GCP, optimizing model inference for edge/cloud (TensorRT/ONNX), and mentoring engineering talent to drive product strategy.

Technical Skills

AI/ML & GenAI: TensorFlow 2/Keras, PyTorch, TensorRT, ONNX, RAG Pipelines, Stable Diffusion, LLMs (Llama/Megatron), Computer Vision (MODNet/MobileNetV2).

Cloud & Infra: AWS (Sagemaker, Lambda, ECS), GCP (Vertex AI), Docker, Kubernetes, Terraform, GitLab CI/CD.

Languages: Python, C++, C#, JavaScript/TypeScript (React), CUDA, SQL.

Leadership: Agile/Scrum Management, Technical Strategy, Cross-Functional Team Leadership, Mentorship, Product Roadmap Definition.

Selected Technical Talks

Hyperfast AI: Rethinking Design for 1000 tokens/s

AI Tinkerers Raleigh, Dec 2025

Presented on hyperfast inference systems (Cerebras) and the fundamental shift in AI application design required when inference speed crosses critical thresholds.

Apple's On-Device VLM: The Future of Multimodal AI

Conference Talk, Sep 2025

Technical deep dive into the future of on-device multimodal AI, focusing on systems that understand images, video, and physical properties alongside text.

Professional Experience

ML Engineering Consultant / Technical Lead

2024 – Present

Remote

Providing high-level technical leadership and hands-on engineering for diverse clients, focusing on scalable AI infrastructure and Generative AI product development.

Custom Vision Language Model & Infrastructure (Matte Model Project): Architected a CPU-efficient, human portrait matting system (MODNet-style/MobileNetV2) using TensorFlow 2/Keras, replacing legacy SDKs in an Electron + React application.
Designed end-to-end MLOps pipelines on GCP Vertex AI for custom training jobs, dataset management (P3M-10k/Adobe), and seamless ONNX export.
Implemented CPU-focused inference strategies utilizing ONNX Runtime, enabling real-time alpha matte generation and video compositing with minimal latency.
Generative AI & RAG Architecture: Designed and implemented real-time RAG-based inference pipelines to power dynamic, personalized product recommendation engines.
Migrated Nvidia Riva/Triton microservices to AWS, re-architecting the deployment for enhanced scalability and high availability.
Performance Optimization & Edge AI: Optimized Computer Vision models for embedded Nvidia Jetson platforms using TensorRT and ONNX, achieving significant reductions in inference latency.
Engineered a cloud-based Stable Diffusion video pipeline for real-time webcam re-rendering and accelerated thumbnail generation.
Strategic Leadership: Advise client executive teams on AI integration strategies, identifying opportunities to leverage LLMs and diffusion models for competitive advantage.

Lead Engineer, AI R&D

2023 – 2024

Vidable.ai | Remote

Led the R&D engineering team in evaluating and productizing cutting-edge Generative AI tools.

Team Leadership & Strategy: Collaborated with PhD researchers to translate academic findings into production-ready features. Led weekly cross-company technical forums to align engineering and product teams on the latest AI developments.
Infrastructure Design: Built robust CI/CD pipelines using Terraform, GitLab, and AWS to deploy ML inference servers, supporting microservices architecture across Docker and Kubernetes.
Model Optimization: Modified C/C++ codebases (llama.cpp & Stable Diffusion Turbo) to suit specific business use cases, optimizing performance for Python-based API endpoints (FastAPI/Uvicorn).
Product Innovation: Prototyped and deployed React-based demos powered by LLMs and diffusion models, directly influencing the product roadmap and UX design.

Lead Engineer

2014 – 2023

Sonic Foundry | Remote

Progression from Engineer to Lead, managing critical data pipelines and spearheading the transition to cloud-native technologies.

Scale & Reliability: Architected and supported data pipelines serving the company's 5 largest enterprise customers (100k+ end users).
Cloud Transformation: Designed and built an AWS cloud-native Archive utility (Lambda, Batch, Terraform) that was adopted as a core product, generating millions in revenue.
AI Leadership: Founded and led an internal AI/ML reading group (growing to ~25% of the company) and initiated company-wide AI hackathons to foster innovation.
LLM Implementation: Designed the inference architecture for a Zoom/conferencing plugin and developed pipelines to deploy GGML Llama models to AWS Serverless architecture.

Technology Specialist

2006 – 2014

NC State University

Led AV/Room Design Engineering for 200+ learning spaces.
Developed novel embedded control systems and accessibility UIs; awarded the OIT Award for Excellence.

Recent Projects & Hackathons

Cerebras OS (2025): Created a hackathon project exploring the UX implications of "instant" AI when inference speed enables real-time, fluid interaction.
FastRecord (2025): Developed a lightweight, local-only screen recording tool emphasizing privacy and low-overhead performance.
Cohere AI Hackathon (2022) - 3rd Place: Developed an "LLM as Learning Assistant" for live presentations, utilizing real-time transcription and agentic prompting.