Ron Jailall - Resume

Professional Profile

AI Agent Architect & Engineering Leader with 15+ years of experience leading teams to deliver scalable, hybrid AI solutions. Expert in bridging the gap between cutting-edge research and productization, with a specific focus on Agentic Workflows, On-Device Inference, and Resource-Constrained Environments. Proven track record of designing multi-step reasoning frameworks, optimizing models for edge hardware (MobileNet, Jetson), and building robust evaluation pipelines to ensure agent reliability and safety.

Core Competencies

Agentic AI: Multi-step Workflow Design, Function/Tool Calling, Task Decomposition, RAG & Context Management, Human-in-the-loop Systems.

Hybrid & On-Device AI: TensorFlow/Keras, PyTorch, ONNX Runtime, TensorRT, Quantization (PEFT), Edge Optimization (CPU/GPU).

Infrastructure & Eval: Vector Databases (Embeddings), Evaluation Pipelines, Benchmarking Metrics, CI/CD (GitLab/Terraform), Cloud (AWS/GCP/Vertex AI).

Leadership: Technical Direction, Cross-Functional Collaboration, Engineering Mentorship, Agile/Scrum, Product Strategy.

Selected Technical Talks

Hyperfast AI: Rethinking Design for 1000 tokens/s

AI Tinkerers Raleigh, Dec 2025

Analyzed the impact of ultra-low latency inference on agentic reasoning loops, demonstrating how high-speed token generation enables more robust self-correction and recovery strategies.

Apple's On-Device VLM: The Future of Multimodal AI

Conference Talk, Sep 2025

Technical deep dive into on-device multimodal agents, focusing on privacy-centric architectures that process vision and text locally without cloud dependency.

Professional Experience

ML Engineering Consultant / Technical Lead

2024 – Present

Remote

Technical leadership for diverse clients, architecting Agentic AI systems and Hybrid Infrastructure.

High-Speed AI Agent Framework (Cerebras Project): Developing a novel AI Agent architecture leveraging ultra-fast inference (>1000 tokens/s) to enable real-time, multi-step task decomposition and self-correction. Designed "interruptible" agent workflows that handle ambiguity and user correction in real-time, mimicking human-like latency in complex reasoning tasks.
On-Device Vision Agent & Hybrid Deployment (Matte Model Project): Architected a CPU-efficient, on-device portrait matting agent (MobileNetV2 backbone) for resource-constrained environments (Electron/Consumer Hardware). Replaced cloud-dependent SDKs with optimized ONNX Runtime inference, reducing latency and enabling offline functionality. Built end-to-end evaluation pipelines on GCP Vertex AI to benchmark model performance, robustness, and accuracy against industry datasets (P3M-10k).
Scale & Infrastructure: Designed RAG-based agent memory systems using vector embeddings to support personalized, context-aware product recommendations. Migrated Nvidia Riva/Triton microservices to AWS, optimizing for hybrid cloud deployment and ensuring high availability for voice-enabled agent interactions.
Edge AI Optimization: Led optimization efforts for Nvidia Jetson platforms, utilizing TensorRT to deploy complex computer vision models in strictly resource-limited embedded environments.

Lead Engineer, AI R&D

2023 – 2024

Vidable.ai | Remote

Led the R&D team in developing and evaluating GenAI capabilities for production workflows.

Agent Workflow Evaluation: Established continuous evaluation pipelines to assess prompt robustness, model drift, and agent reliability across multiple LLM backends (OpenAI, Anthropic, Open Source).
Tooling & Automation: Built internal tools to automate the testing of prompt engineering strategies, significantly speeding up the time-to-adoption for new GenAI features.
Infrastructure: Architected Terraform/GitLab CI/CD pipelines to deploy inference microservices, supporting both cloud-based and local execution models.
Strategy: Collaborated with Product Managers and PhD researchers to define the technical roadmap, translating "latest advancements" in diffusion and LLMs into tangible product features.

Lead Engineer

2014 – 2023

Sonic Foundry | Remote

Engineering leadership focused on large-scale system design and cloud transformation.

Team Leadership: Mentored and led a distributed engineering team, fostering collaboration to deliver critical updates for a platform serving 100k+ users.
System Architecture: Designed and deployed a cloud-native archiving utility (AWS Lambda/Batch) that scaled dynamically to handle massive data loads, directly driving millions in revenue.
Innovation: Founded the company's internal AI reading group and led hackathons to explore early agentic workflows and automated content generation.
Early LLM Adoption: Prototyped agent-based search tools and successfully deployed GGML Llama models to serverless architectures for cost-effective inference.

Technology Specialist

2006 – 2014

NC State University

Designed and implemented embedded control systems for 200+ learning spaces, focusing on accessibility and user-centric automation.

Selected Projects

Cerebras OS Agent (2025): An experimental agent framework exploring the UX of "instant" AI, utilizing high-speed inference to perform complex tool-use and reasoning without user-perceived delay.
Agentic Presentation Assistant (Cohere Hackathon - 3rd Place): Built an LLM-based agent utilizing agentic prompting and real-time transcription to act as a live learning assistant, parsing context and fetching external knowledge on the fly.