Or follow us on social media:
Floe is under active development and will enter Beta soon. We’d love for you to help us shape the world's best Lakehouse SQL compute engine.

Hello, Floe.

by·

Floe is our new start-up: A SQL compute platform on the data lakehouse, for human and agentic users. I'm going to share what it's about and why it matters. I'm also going to talk about our logo, the cat. If you prefer watching to reading, please see the quick intro video on the home page instead, which is pretty straight to the point and also my first ever voiceover.

The problem

We spent a ton of time talking with enterprise data teams about their plans and it's clear that all of them have, or want, a data lakehouse. What's important to them is having only one copy of the data, the need for governance and agentic access. Iceberg was in the future, yet for Databricks customers it was still mostly Delta Lake and in others, just parquet files on object storage.

Smart data teams realise that they still need to be able to query silver data for business users to get insights straight away, because filing data engineering tickets to pre-aggregate and materialise reports will frustrate the business. They know that they will have challenges in moving tier-1, operational business analytics and associated BI tools onto the lakehouse because it doesn't respond in the way the data warehouse did, nor is it as reliable.

Smart teams also know that LLMs that vibe-code Python apps for business users normally generate horrific query patterns they need to guard against; that LLM-generated SQL is even more ad-hoc than human ad-hoc SQL; and that AI data access must be monitored and ring-fenced in case the AI gets compromised and goes rogue.

We found that Databricks' SQL - with or without Photon bells and whistles - simply isn't up to the job, neither is anything from Amazon or Microsoft. Not everyone wants to get locked into Snowflake or BigQuery to solve the problem, and they don't want to waste time coding and re-architecting data models to work with Trino.

Floe fixes this

Most importantly:

Floe makes the lakehouse respond to queries just like a hardcore, business critical data warehouse.

Financially auditable results - got that. Sophisticated SQL - check. 200-screens-of-repeated-expression awful SQL with CTEs, rich data models, fast needle-in-haystack queries, fast scans - we can do all that too. We have advanced query planning and statistics, a *DBC interface and Arrow for new stuff and MCP for agents.

Floe will be a painless service - sign up, point it at your lakehouse, and go query 'till your hearts content.

How Floe works

There are actually around four major components in the Floe platform, each internally structured as micro-services:

Floecat is a meta-catalogue. It transparently takes metadata from lakehouses with Delta and Iceberg tables and transparently enriches them with missing statistics and SQL-type metadata needed for query planning and interoperability. It also lets you have one namespace to access many different lakehouses, even in different formats, if you'd like. It's open source and works with Trino and DuckDB as well.

FloeSQL contains our query planner and optimiser, caching storage engine, query compiler and executor. This part of the stack uses robust Yellowbrick IP with enhancements. The query planner is cost-based, has a full library of rewrites, and applies rich optimisations to complex join topologies using advanced statistics and AI. The caching storage engine, called Catalyst, transcodes columnar Parquet files into granular, indexed fragments cached on NVMe storage. The query compiler and executor takes plan trees and converts the queries to vectorized executable code using LLVM.

FlowWLM helps make sure business objectives can be met even under varying workloads. It allows compute resource to be segmented to guaranteeing quality of service, and makes sure that users who do awful things from Python or produce evil, long running queries are penalty-boxed so they don't interfere with operational workloads.

When will we have this? What about the cat?

We're in the process of building out Floe now, and we'll get to Beta later this year. I'll be posting a lot more about the tech, the team, the culture and how we solve all sorts of interesting technical problems. As well as perhaps the odd video.

Oh, about the cat. Didn't get to that. We want to build a brand that looks like something you'd like to wear, not yet another boring blue cloud company logo. We've worked with some amazing fashion, brand and typography designers to put together a look and feel that's distinctly non-tech. After all, corporate apparel is boring. I'll be sharing more on this over the coming weeks too.

Do follow Floe on LinkedIn here, our company blog here, or my personal blog here.

This article was written by me, not by GPT.

Author
Neil
Follow us