Most readers know that at Floe, we're busy building a SQL compute service on the data lakehouse. Floe's built to eat up ad-hoc queries from humans and AIs. It makes sure results are delivered reliably with quality of service no matter what the workload mix.
We just celebrated a major development milestone on the road to delivering the first version of our product. Our first milestone was to get end-to-end TPC-H queries running against Delta Lake and Iceberg lakehouses. We're using a micro-services architecture which I'll outline in a subsequent blog, and end-to-end flow means we've got to traverse dozens of these little programs written in Golang, Rust, C, C++ and Java (at least). We got through creating a new account, signing up, importing a data catalogue or two (in this case, a Delta and Iceberg view of the same data hosted on AWS Glue), creating a small compute cluster, firing up a query editor, and executing some queries. All in our Production AWS account, delivering correct results.
This milestone matters because we can now start to test correctness of the entire system, enforce continuous integration and deployment of micro-services to Production, and start to build system regression tests. We can start work on building out new functionality without worrying about regressions.
To pull this together, the team implemented all sorts of functions, listed below, each of which is further subdivided into multiple micro-services:
This took just over two months and came alive on 6th March, approximately 3 weeks ahead of schedule. AI tools like Claude Code and Codex – wielded carefully – have accelerated everything we do, and we're running faster and more nimbly as an early-stage business where everyone's having fun cranking out code.
To celebrate, we drank plenty of beer and wine at the office bar, ate too much cheese, and threw a party at Amazonico. One team member (who preferred to remain anonymous) even got quietly married, congrats!!!!!
We're heads-down into planning our next major milestone, the key component of which is a new scale-out scan-planning service that we expect to set some world records pruning Parquet files on object stores. We're going to take a page out of the Clickhouse Cloud journey and "dog-food" Floe running alongside Databricks for OpenTelemetry, as well as our marketing warehouse and our QA warehouse. These projects also require us to implement the VARIANT Iceberg type for interoperability as a replacement for the old Yellowbrick JSON type.
Stay tuned for more updates!
- The Floe team