Apache Spark Performance Optimization
KwikQuery LLC is a Beaverton, Oregon-based software company focused on solving real performance problems in Apache Spark — specifically the ones that don't show up in standard benchmarks but cause serious pain in production.
Asif Hussain Shahid — Co-Founder & CTO
Founder · Query Engine ArchitectI've spent 27 years working on data systems — focused on query engine internals, low-latency execution, and the kind of performance work that determines whether a system holds up under real-world pressure. That journey led me to found KwikQuery LLC and build TabbyDB, a high-performance fork of Apache Spark that tackles query compilation and runtime performance problems at scale.
My background is in Chemical Engineering from IIT Bombay, but I moved into software early. Much of my foundation in performance engineering was built at GemStone Systems working on GemFire — the distributed data platform later open-sourced as Apache Geode. GemFire was an environment where low latency wasn't a goal, it was the culture, and that shaped how I approach every system I've worked on since. During that time I made significant enhancements to GemFire's OQL engine — extending it to evaluate join queries using range indexes via high-performance nested loop execution, and adding colocated join and nested query support, each of which required substantial rework of the engine's core, apart from work on Disk Based Persistence and Client Server topology.
From there I worked at VMware, Dell, Pivotal, SnappyData (all as part of acquisitions of GemStone Systems via VMware), Tibco (after acquisition of SnappyData), Workday, and Cloudera — building approximate query processors, off-heap memory storage layers, and Spark optimizer improvements along the way.
At Workday, a Constraint Propagation problem in Apache Spark's Catalyst optimizer had been identified as a serious bottleneck — complex plans were taking hours to compile. I devised a new algorithm that brought that down from 8 hours to under 10 minutes, work recognized with Workday's Tech Wizard award and co-presented at the Databricks Spark Summit in 2022. At Cloudera I continued at the engine level, resolving deep performance and functionality bugs for customers including Bank of America, Amazon, and JPMC.
TabbyDB applies all of that to fix problems stock Spark leaves unsolved — compile-time blowup from constraint explosion and uncapped query trees, redundant optimizer rule application, excessive Hive Metastore calls, and runtime inefficiency from the absence of Broadcast Hash Join key pushdown for file pruning on non-partitioned columns. TPC-DS benchmarks show 13% improvement on AWS nodes and 17% on Ampere M1 nodes. Combined with Apache Iceberg, early testing shows a 46% improvement, with larger benchmarks underway.
Outside of work, I've volunteered as a shelter assistant at the Cat Adoption Team in Beaverton for the past three years, going in every couple of weeks. I'm based in Beaverton, Oregon.
If you're dealing with Spark performance or compilation issues, I'd be happy to talk.
Taha Hussain — Co-Founder
Co-Founder · GTM & OperationsI co-founded KwikQuery with Asif to translate his engine-level work on Apache Spark into a product the industry can deploy and depend on. I lead every function outside of engineering — sales, go-to-market, partnerships, fundraising, and operations.
By day I'm a Technical Program Manager at Daimler Trucks North America, running cross-functional engineering programs across multiple concurrent vehicle and software lines. I hold a degree in Electrical & Computer Engineering from Oregon State University and I'm based in Beaverton, Oregon.
If you're evaluating TabbyDB for your team and want to talk procurement, deployment, or support, I'm the right place to start.
Beaverton, Oregon, USA · kwikquery.com