Ron Jailall

AI Agent Architect & Engineering Leader

Raleigh, NC | (608) 332-8605

rojailal@gmail.com

https://ironj.github.io/

Professional Profile

AI Agent Architect & Engineering Leader with 15+ years of experience leading teams to deliver scalable, hybrid AI solutions. Expert in bridging the gap between cutting-edge research and productization, with a specific focus on Agentic Workflows, On-Device Inference, and Resource-Constrained Environments. Proven track record of designing multi-step reasoning frameworks, optimizing models for edge hardware (MobileNet, Jetson), and building robust evaluation pipelines to ensure agent reliability and safety.

Core Competencies

Agentic AI: Multi-step Workflow Design, Function/Tool Calling, Task Decomposition, RAG & Context Management, Human-in-the-loop Systems.
Hybrid & On-Device AI: TensorFlow/Keras, PyTorch, ONNX Runtime, TensorRT, Quantization (PEFT), Edge Optimization (CPU/GPU).
Infrastructure & Eval: Vector Databases (Embeddings), Evaluation Pipelines, Benchmarking Metrics, CI/CD (GitLab/Terraform), Cloud (AWS/GCP/Vertex AI).
Leadership: Technical Direction, Cross-Functional Collaboration, Engineering Mentorship, Agile/Scrum, Product Strategy.

Selected Technical Talks

Hyperfast AI: Rethinking Design for 1000 tokens/s

AI Tinkerers Raleigh, Dec 2025
  • Analyzed the impact of ultra-low latency inference on agentic reasoning loops, demonstrating how high-speed token generation enables more robust self-correction and recovery strategies.

Apple's On-Device VLM: The Future of Multimodal AI

Conference Talk, Sep 2025
  • Technical deep dive into on-device multimodal agents, focusing on privacy-centric architectures that process vision and text locally without cloud dependency.

Professional Experience

ML Engineering Consultant / Technical Lead

2024 – Present
Remote

Technical leadership for diverse clients, architecting Agentic AI systems and Hybrid Infrastructure.

  • High-Speed AI Agent Framework (Cerebras Project): Developing a novel AI Agent architecture leveraging ultra-fast inference (>1000 tokens/s) to enable real-time, multi-step task decomposition and self-correction. Designed "interruptible" agent workflows that handle ambiguity and user correction in real-time, mimicking human-like latency in complex reasoning tasks.
  • On-Device Vision Agent & Hybrid Deployment (Matte Model Project): Architected a CPU-efficient, on-device portrait matting agent (MobileNetV2 backbone) for resource-constrained environments (Electron/Consumer Hardware). Replaced cloud-dependent SDKs with optimized ONNX Runtime inference, reducing latency and enabling offline functionality. Built end-to-end evaluation pipelines on GCP Vertex AI to benchmark model performance, robustness, and accuracy against industry datasets (P3M-10k).
  • Scale & Infrastructure: Designed RAG-based agent memory systems using vector embeddings to support personalized, context-aware product recommendations. Migrated Nvidia Riva/Triton microservices to AWS, optimizing for hybrid cloud deployment and ensuring high availability for voice-enabled agent interactions.
  • Edge AI Optimization: Led optimization efforts for Nvidia Jetson platforms, utilizing TensorRT to deploy complex computer vision models in strictly resource-limited embedded environments.

Lead Engineer, AI R&D

2023 – 2024
Vidable.ai | Remote

Led the R&D team in developing and evaluating GenAI capabilities for production workflows.

  • Agent Workflow Evaluation: Established continuous evaluation pipelines to assess prompt robustness, model drift, and agent reliability across multiple LLM backends (OpenAI, Anthropic, Open Source).
  • Tooling & Automation: Built internal tools to automate the testing of prompt engineering strategies, significantly speeding up the time-to-adoption for new GenAI features.
  • Infrastructure: Architected Terraform/GitLab CI/CD pipelines to deploy inference microservices, supporting both cloud-based and local execution models.
  • Strategy: Collaborated with Product Managers and PhD researchers to define the technical roadmap, translating "latest advancements" in diffusion and LLMs into tangible product features.

Lead Engineer

2014 – 2023
Sonic Foundry | Remote

Engineering leadership focused on large-scale system design and cloud transformation.

  • Team Leadership: Mentored and led a distributed engineering team, fostering collaboration to deliver critical updates for a platform serving 100k+ users.
  • System Architecture: Designed and deployed a cloud-native archiving utility (AWS Lambda/Batch) that scaled dynamically to handle massive data loads, directly driving millions in revenue.
  • Innovation: Founded the company's internal AI reading group and led hackathons to explore early agentic workflows and automated content generation.
  • Early LLM Adoption: Prototyped agent-based search tools and successfully deployed GGML Llama models to serverless architectures for cost-effective inference.

Technology Specialist

2006 – 2014
NC State University
  • Designed and implemented embedded control systems for 200+ learning spaces, focusing on accessibility and user-centric automation.

Selected Projects

Education & Certifications

NC State University | Electrical & Computer Engineering (Completed 75 Credit Hours)
Coursera Verified Certificates: Neural Networks for Machine Learning (Hinton), Image and Video Processing.