Upcoming Talk: When AI Gets Hyperfast - Rethinking Design for 1000 tokens/s
I’ll be speaking at AI Tinkerers Raleigh about what happens when AI inference becomes fast enough to fundamentally change how we think about application design.
The Talk
When inference engines like Cerebras AI can generate up to 1000 tokens per second, we’re not just talking about faster responses—we’re witnessing a paradigm shift in how we design and interact with AI-powered tools.
I’ll be expanding on a project I started a few months ago using the Cerebras inference engine to demonstrate how AI models running at this rate require new ways of thinking about AI app design and UX. The demo will show how an OS might work where AI tools are generated on demand.
Why This Matters
Think about it: when AI can generate entire tools on a button click, the traditional boundaries of software design dissolve. You stop pre-programming every interaction and start generating interfaces in real-time. The AI becomes synchronous with the interaction itself.
This isn’t about making existing apps faster. It’s about what becomes possible when speed crosses a threshold—when AI stops being something you wait for and becomes something woven into the fabric of the interaction loop.
A Glimpse of the Future
The demo explores what computing looks like when:
- Tools and interfaces are generated on demand rather than pre-built
- AI lives inside the interaction loop, not as an async background process
- Every click can invoke an AI that responds faster than you can perceive the delay
We’re at the beginning of a shift. As inference speeds continue to improve, we’ll see entirely new categories of software that simply couldn’t exist before. This talk is about imagining—and demonstrating—what that future looks like.
Want to work with me? Send me an email.