Ron Jailall

Creative Problem Solver & ML Engineer

Raleigh, NC | (608) 332-8605

rojailal@gmail.com

https://ironj.github.io/

Research Profile

Creative Problem Solver and ML Engineer with 15+ years of experience bridging the gap between foundational research and scalable product experiences. Expert in Multimodal LLMs, Human-Centric Computer Vision, and On-Device Inference. Proven ability to adapt state-of-the-art research (Gaussian Splatting, VLMs) to run efficiently on Apple Silicon using CoreML, Metal, and Swift. Passionate about advancing human understanding through AI and shipping pioneering experiences on VisionOS and iOS.

Core Competencies

Multimodal AI: Vision Language Models (VLMs), Text-to-Image/3D, RAG Pipelines, Audio/Video Semantic Understanding.
Computer Vision: Human Matting & Segmentation, 3D Reconstruction (Gaussian Splatting), Depth Estimation, Real-Time Tracking.
Apple Ecosystem: VisionOS, CoreML, Metal Performance Shaders, Swift, RealityKit, ARKit.
Research & Eng: PyTorch, TensorFlow 2, Model Quantization, Latency Optimization, Technical Writing & Presentation.

Research & Implementation Highlights

On-Device Multimodal Intelligence (Research Focus)

  • Vision Language Models on Edge: Conducted deep-dive technical research into deploying Multimodal Large Language Models on Apple Silicon, analyzing memory bandwidth constraints and quantization strategies to enable local understanding of images and text.
  • Privacy-Centric Architectures: Designed architectures for local-only processing of sensitive user data (screen recording, camera inputs), aligning with Apple's privacy-first values for human-centric features.

VisionOS Volumetric Reconstruction (3D/Video CV)

  • State-of-the-Art Implementation: Adapted the SHARP (Gaussian Splatting) research model to run natively on VisionOS, successfully converting a research-grade pipeline into a real-time, interactive AR experience on Vision Pro.
  • Metal & Physics Integration: Engineered custom Metal shaders to add interactive physics ("jiggle" dynamics) to reconstructed volumetric scenes, pushing the boundary of how users interact with static 3D memories.
  • Performance Engineering: Overcame browser sandboxing limits by re-architecting the solution from WebGL to CoreML/Swift, ensuring the high-throughput performance required for immersive Apple-quality experiences.

High-Fidelity Human Understanding (Matte Model)

  • Human-Centric CV: Architected a MobileNetV2-based portrait matting model specifically for accurate human segmentation, a core component of "understanding" users in video streams.
  • Efficiency Optimization: Retrained and pruned the model architecture for CPU-focused inference, achieving high-fidelity alpha mattes suitable for real-time background replacement on consumer hardware.

Professional Experience

ML Engineering Consultant / Applied Researcher

2024 – Present
Remote

Advancing practical applications of Multimodal AI and Computer Vision for diverse clients.

  • Multimodal Agentic Workflows: Developing agentic AI systems that integrate text and vision modalities to solve complex reasoning tasks, utilizing high-speed inference (>1000 tokens/s) to enable fluid, human-like interaction loops.
  • Video & Sensor Fusion: Architected a multi-view camera tracking system on Nvidia Jetson devices, synchronizing diverse sensor inputs for precise real-time environmental understanding.
  • Cross-Functional Collaboration: Partnering with hardware and software teams to migrate research-grade Nvidia Riva/Triton microservices to scalable cloud architectures, ensuring robust performance for end-users.

Lead Engineer, AI R&D

2023 – 2024
Vidable.ai | Remote

Led the exploration and productization of Generative AI and Multimodal models.

  • Foundational Model Evaluation: Collaborated with PhD researchers to evaluate the latest advancements in Diffusion Models and LLMs, establishing benchmarks for their applicability to video and audio domain tasks.
  • Research to Product: Translated "loose" research concepts into concrete product features, building React-based prototypes powered by multimodal backends to demonstrate feasibility to stakeholders.
  • Technical Communication: Hosted weekly seminars to disseminate findings on the latest AI advancements (e.g., CVPR/NeurIPS papers) to the broader engineering organization, fostering a culture of continuous learning.

Lead Engineer

2014 – 2023
Sonic Foundry | Remote

Engineering leadership focused on large-scale video processing and intelligent retrieval.

  • Video Understanding: Prototyped neural search capabilities for massive video archives using segmentation and classification algorithms (PyTorch/TensorFlow), enabling semantic retrieval of video content.
  • Machine Learning at Scale: Built and deployed U-Nets for image analysis and utilized AWS Sagemaker to scale model training pipelines.
  • Innovation Leadership: Founded the company's AI reading group and led hackathons to explore early applications of deep learning in the video domain.

Selected Technical Talks

Apple's On-Device VLM: The Future of Multimodal AI

Sep 2025
  • Presented on the convergence of LLMs and Computer Vision on edge devices, discussing the future of "seeing" AI assistants.

Hyperfast AI: Rethinking Design for 1000 tokens/s

Dec 2025
  • Explored how ultra-low latency inference changes the design paradigm for agentic and human-computer interaction.

Education & Certifications

NC State University | Electrical & Computer Engineering (75 Credit Hours)
Coursera Verified Certificates:
  • Neural Networks for Machine Learning (Geoffrey Hinton) | ID: 3MJACUGZ4LMA
  • Image and Video Processing | ID: E9JX646TTS