Why You Should Join Opaque

Multi-party computation is the future of business intelligence.

Aug 29, 2022

Welcome to “Why You Should Join,” a bi-weekly newsletter for engineers highlighting early-stage startups on track to becoming generational companies.

As engineers ourselves, we know how difficult it is to pick the right startup to join. Doing your own analysis for every TechCrunch article, recruiter InMail, or VC tweet would be impossible. Let us help you out instead :)

Every other Monday, we cut through the noise and use thorough research, rigorous analysis, and inside information we’ve hustled to get to present you with an early-stage startup we believe will become a generational company. We go deeper than any other source out there to help ambitious new grads, FAANG veterans, and experienced operators find the right rocket ship to join. Sound interesting? Join the family and subscribe here:

Why You Should Join Opaque

(Click the link to read the full piece online)

Data has never been more valuable. A data-driven organization is 23 times more likely to acquire customers, 6 times more likely to retain customers, and 19 times more likely to be profitable than one that isn’t. Knowing this, a full 81% of American businesses tried to expand their data analytics teams last year. Job openings for data scientists are up 480% since 2016, and the world is projected to spend 684 billion on big data and business analytics by 2030.

Everyone knows that if you want to grow your business, data is the place to start.

It’s not surprising, then, that 70% of businesses are interested in data sharing agreements. In a data sharing agreement, different parties pool their data to analyze together. The idea is simple: larger datasets yield more insights, benefiting all participants. In 2020, some of the world’s largest pharmaceutical companies pooled their data to accelerate development of the COVID-19 vaccine, saving countless lives in the process. In 2021, teams participating in data pooling agreements produced three times more “measurable economic benefit” than ones who didn’t. Done right, data sharing is one of the most impactful things a company can do.

The problem is, “doing it right” is almost impossible. For good reason, stricter privacy laws and regulations have made data sharing a non-starter in most industries. For instance:

Financial institutions could fight fraud much more effectively by sharing data between each other. However, strict internal policies designed to avoid compliance violations make this impossible.
Healthcare institutions could improve patient outcomes by sharing patient data between each other. However, patient privacy laws and geography-based data use regulations make this impossible.

The fundamental problem is simple: data sharing violates confidentiality. Doing anything meaningful with data (training models, running analytics) requires decrypting it, meaning your data is exposed to everyone else in the pool. For competitive, regulatory, and ethical reasons, this is usually a dealbreaker.

But what if you could get the benefits of a pooled dataset without exposing your data to everyone else in the group? What if everyone’s data remained encrypted while in use, so no one could access data they didn’t contribute?

That would allow you to share data without any of the competitive, regulatory, or ethical repercussions mentioned earlier. That would bring data pooling to the masses, unlocking tremendous value for every company in every industry. That would be a paradigm shift in data analytics.

But that would be impossible, right?

Wrong, because that’s exactly what Opaque has built.

Background

Opaque is building a platform for secure multi-party computation, which is the formal term for analytics on pooled data. Concretely, their product allows you to run Apache Spark jobs on shared datasets while ensuring people can’t access data they didn’t contribute.

Opaque accomplishes this seemingly-impossible value proposition by leveraging several recent breakthroughs in cryptography and processor design. Specifically, they run their Spark jobs in cryptographically-fortified Intel SGX enclaves. You can think of SGX enclaves as “vaults” built into Intel chips: once you load a program and some data into one, the hardware itself prevents unauthorized parties from peeking in. You’d normally have to rewrite your entire program with a 186-page-long set of opcodes to use enclaves, but Opaque handles all of that for you. From your perspective, it’s the exact same as running a normal Spark job.

POV: You’re trying to write your own enclave program.

Concretely, Opaque creates value for businesses in three ways:

Simple Sharing
Because of how no one (including Opaque) can access data they didn’t contribute, joining data pools is frictionless: you won’t have to worry about exposing sensitive information or breaking privacy laws ever again.
Simple Scaling
Shared datasets are big datasets — processing them efficiently requires a lot of enclave-equipped CPUs. Opaque handles the challenges of secure inter-enclave communication and orchestration for you so that scaling data pools is frictionless.
Simple Security
Because of how strong Opaque’s system is, securing sensitive data is frictionless: even if you don’t join a data pool, analyzing your most sensitive information with Opaque significantly lowers the threat of it being hacked or stolen.

As the most secure and performant platform on the market, Opaque is already being used by some of the world’s largest organizations. Healthcare institutions are using them to share patient information during clinical trials and financial institutions are using them to detect fraud and money laundering. Current customers include some of the largest financial, healthcare, and technology companies in the country.

Opaque was founded in 2021 as a spinout from UC Berkeley’s RISELab by Dr. Ion Stoica, Dr. Raluca Ada Popa, Dr. Rishabh Poddar, Dr. Wenting Zheng, and Chester Leung. Previously, the team worked on the open-source MC2 project, a platform for running secure analytics and machine learning on encrypted data. Collectively, the founding team has earned 14 degrees in electrical engineering and computer science, published almost 200 papers, and co-founded 4 deep-tech startups, one of which was Databricks.

On the grounds of their impressive traction, technology, and team, Opaque raised a 22 million dollar Series A in 2022 (actually while we were researching this piece!) led by Walden Catalyst. They previously raised a 9.5 million dollar seed round in 2021 (led by Intel Capital) and are also backed by Storm Ventures, Thomvest Ventures, Race Capital, FactoryHQ, and The House Fund.

Opaque’s revolutionary technology, incredible product, and stacked team make them an exceptional startup. But are they a category-defining business?

We think they have the potential to be.

In this piece, we’ll dig deeper to see why this is the case. We’ll look at how they’re targeting a massive, globally important market. We’ll look at how they’re well-positioned to become a leader in that market. We’ll look at how they stack up against the competition, and why their team is perfect to execute on this opportunity. And we’ll do it all as thoroughly as we can.

Ready? Let’s begin.

Opportunity

We’ll begin our analysis of Opaque like we do for all of our companies, from a first principle:

In order for a company to become massive, it must lead a massive, growing market.

Many companies with clear product-market fit don’t become truly massive because they don’t meet this condition. Thus, we must establish two things:

Opaque operates in a massive, growing market.
Opaque will become a leader in that market.

A Massive, Growing Market

Broadly speaking, Opaque operates in the market for confidential computing, the set of technologies protecting “data in use.” Empirically, this is a massive, growing market: Everest Group reports that global spending on confidential computing is set to grow at a CAGR of 90-95% over the next five years to reach 54 billion by 2026. Spending on multi-party computation and confidential analytics, the specific sub-segment that Opaque focuses on, is today about a fifth of that market but set to grow disproportionately at a CAGR of 150% to reach about 20 billion by 2026.

Why is the market so big? Intuitively, multi-party computation and confidential analytics is a massive market because it sits at the exact intersection of two even larger industries:

The global market for big data and analytics is set to reach 684 billion by 2030. The world needs tools to analyze its data.
The global market for big data security is set to reach 115 billion by 2030. The world needs tools to protect its data.

By securing existing forms of data analysis and enabling new ones, tools for multi-party computation and confidential analytics uniquely meet the intersecting needs of data science and cybersecurity teams at banks, hospitals, and defense companies around the world. That’s a lot of potential buyers.

Why is the market growing so fast? Intuitively, multi-party computation and confidential analytics is a growing market because it sits at the exact intersection of several massive tailwinds:

Market tailwinds: the parent markets of big data analytics and big data security are experiencing massive growth:
- The global market for big data and analytics will grow at a CAGR of 13.5% through 2030. This makes sense given how the amount of digital data created will grow at a CAGR of 23% through 2025: the amount of data to analyze is growing, and our ability to analyze it is improving.
- The global market for big data security will grow at a CAGR of 17.5% through 2029. This makes sense given how the number of recorded data breaches reached an all-time high in 2021 and affected 5.9 billion accounts in total: the amount of data to protect is expanding, and the number of people trying to steal it is growing.
Regulatory tailwinds: by the end of 2024, 75% of the world’s population will have its personal data covered under modern privacy regulations, up from 10% in 2020. Concretely, lawmakers in 137 countries are writing laws that “achieve adequacy" with the EU’s GDPR, which has become the “de facto global standard” for data privacy and security. As explicitly noted in Article 32 of GDPR, securing data while its being processed (confidential analytics) is a critical part of this.
For companies that help secure your data, this is great news. But for companies that are more liberal with your data…

Behavioral tailwinds: more than ever, businesses are weighing cybersecurity as a critical factor when making decisions:
- By 2025, 60% of organizations will use cybersecurity risk as a primary determinant in conducting third-party transactions and business engagements.
- By 2026, 50% of C-level executives will have cybersecurity risk-related performance requirements built into their employment contracts.
As a result, 60% of large organizations will use at least one privacy-enhancing computation technique (i.e. confidential analytics) by 2025.
Technological tailwinds: as mentioned earlier, the rise of multi-party computation and confidential analytics is the result of several breakthroughs in cryptography and processor design. These breakthroughs have only now reached commercial maturity: in the past three years, all major chip (Intel with SGX, AMD with SEV, ARM with TrustZone) and cloud (Azure with SGX, GCP with SEV, AWS with Nitro) providers have introduced support for hardware enclaves, enabling the software built on top of them.

Any one of these tailwinds would independently drive a market towards massive growth. Together, however, we have a perfect storm of conditions allowing the market for multi-party computation and confidential analytics to more than double in size each year.

A Leader in the Market

Growing at a CAGR of 150%, multi-party computation and confidential analytics stands to be the fastest growing market we’ve ever covered. Although it’s early, there is sure to be significant competition. Even so, we’re confident that Opaque will end up as a leader in the market because:

Opaque has the best technology.
Opaque has the best product.
Opaque has the best team.

The Best Technology

The two most important metrics around multi-party computation are security and efficiency: security because you don’t want unauthorized parties viewing your data, and efficiency because of how large pooled datasets are. On both metrics, Opaque’s technology is best-in-class:

Security
Hardware enclaves enforce confidentiality against direct attempts at reading protected data, but they remain vulnerable to side-channel attacks — techniques for indirectly reading protected data. By observing an enclave program’s indirect effects on the system (i.e. OS page access patterns, power usage patterns, timing patterns), clever attackers can extract confidential text files, images, and even AES/RSA keys.
It’s like although you can’t read their mind, you can deduce how bad your roommate got dumped based on how loud they’re playing Olivia Rodrigo and how much Taco Bell they’re eating.
Opaque uniquely defends against side-channel attacks by adding a layer of cryptographic security on top of hardware enclaves. Using cryptographic black magic, they’ve developed their platform such that no indirect effects on the system are visible at runtime, effectively nullifying known side-channel attacks. To get a bit technical, Opaque is an oblivious program built on a set of oblivious relational algebra operators proposed by Professor Wenting Zheng’s (Opaque’s Chief Scientist) in her PhD thesis from UC Berkeley. These fundamentally new cryptographic primitives are what prevent attackers from extracting confidential information via side-channels.
We’d like to emphasize that this represents a categorical difference between Opaque and other players in the space. While there are other systems that leverage hardware enclaves, Opaque alone has invented new cryptography on top of them to go above and beyond in preventing side-channel attacks. This easily makes them the most secure.
Efficiency
Compared to running unencrypted Spark jobs on traditional hardware, executing them securely across multiple parties will naturally add some overhead. How much overhead can quickly become a problem given how large shared datasets get.
Thankfully, Opaque has the most efficient solution on the market. The new cryptographic primitives leveraged in Opaque SQL’s original research version allow it to run in a distributed, parallelized, and oblivious fashion up to 2300x faster than previous state-of-the-art implementations. The closed-source version of Opaque SQL (used in production) is a further 20x faster than that, bringing total overhead to less than 1x the performance of vanilla (unencrypted) Spark SQL.
As we will see later in the piece, this makes Opaque at least 2-3 orders of magnitude faster than competing technologies.

The Best Product

To successfully productize their technology, Opaque must create real value for their core users: data scientists. Thankfully, this is an audience the Opaque team knows well. Co-founder and board member Ion Stoica’s prior startups Anyscale and Databricks (both RISELab spin-outs, both unicorns) were also built on helping data scientists do their jobs better. Opaque’s laser focus on this user group easily makes them the best product for multi-party computation and confidential analytics. By tailoring their technology and product around this specific use case, they’ve made their user experience truly frictionless:

Frictionless Adoption
Enabling collaborative analytics requires a lot more than just running Apache Spark in an enclave. There’s a significant amount of peripheral infrastructure required to make this particular use case work: keys have to be managed, deployments have to be attested, policies have to be enforced, and data has to be piped in and out to the correct parties. Further, all this has to be done with the same degree of confidentiality and efficiency.
No data scientist would want to set this up themselves. That’s why Opaque handles it for them under the hood, making adoption frictionless.
Frictionless Usage
Once set up, Opaque is super easy to use. As just a confidential version of Apache Spark, users can use the same syntax to run the same jobs they’re used to, just in a confidential and multi-party setting. If you’re familiar with Spark or Databricks, Opaque will fit right into your workflow.
Scale isn’t an issue either — as mentioned earlier, Opaque handles the challenges of secure inter-enclave communication and orchestration under the hood, making usage frictionless.

Of course, data scientists aren’t the only stakeholders Opaque creates value for. Opaque also helps cybersecurity worry less about data breaches, legal worry less about compliance violations, and engineering worry less about building security features in-house. That’s a lot of potential entry points when selling to banks, hospitals, and government institutions.

The Best Team

Inventing new cryptography and commercializing it successfully is no small feat. When you look at the elite academic and entrepreneurial pedigree of the team behind Opaque, however, things start to make sense:

Rishabh Poddar - Co-Founder, CEO
PhD in Computer Science from UC Berkeley, where he worked at RISELab and was advised by Professor Popa. Previously worked at IBM, BCG, Microsoft. Published 18 papers on applied cryptography and systems security.
Ion Stoica - Co-Founder, Board Member
Professor of Computer Science at UC Berkeley, co-founder and co-director of RISELab. Previously co-founder and CEO of Databricks (currently chairman), co-founder of Anyscale, and co-founder of Apache Spark/Apache Mesos. Holds a Ph.D. in Electrical & Computer Engineering from Carnegie Mellon.
Raluca Ada Popa - Co-Founder, President
Associate Professor of Electrical Engineering and Computer Science at UC Berkeley, co-founder and co-director of RISELab. Previously co-founder and CTO of PreVeil, an MIT spin-out. Holds a Ph.D. in Computer Science (plus three other degrees) from MIT, received 2021’s ACM Grace Murray Hopper award.
Wenting Zheng - Co-Founder, Chief Scientist
Assistant Professor of Computer Science at Carnegie Mellon. Holds a Ph.D. in Computer Science from UC Berkeley (jointly advised by Professors Stoica and Popa), where she worked at RISELab. Also holds a bachelors and M.Eng in Electrical Engineering and Computer Science from MIT.
Chester Leung - Co-Founder, Principal Engineer
B.S. and M.S. in Computer Science from UC Berkeley, where he worked at RISELab and was advised by Professor Popa. One of the original implementers and maintainers of MC2. Previously worked at Facebook.
Jay Harel - VP Product
Previously Senior Director of Product at Illumio, VP Product at Kollective Technology, and co-founder at Tripio/Intervu, two consumer startups. Holds an MBA from Cornell.

Beyond collectively holding 14 degrees in electrical engineering and computer science, publishing almost 200 papers, and co-founding 4 deep-tech startups, Opaque’s founding team forms an accomplished academic family tree, something which shouldn’t be discounted. They’ve already spent several years bonding while attacking some of the toughest problems in computer science at RISELab. That’s about as battle-tested as they come.

Connecting them further is a shared culture of ownership. Rishabh emphasized how everyone on the team is encouraged to “think like a founder” and own at least one initiative all the way through. For instance, Octavian owns integrity checking for the execution engine and Saharsh owns control plane architecture. “No Meeting Wednesdays” and adherence to a maker’s schedule ensures that everyone has time to work on their initiatives. Both times we spoke, Rishabh showed us his commitment to detail firsthand by staying overtime to make sure we understood their technology and market positioning properly.

It’s hard to imagine a team better equipped to take on this opportunity.

Competitive Landscape

We’ve spent some time analyzing Opaque’s unique strengths as a technology, product, and company. Now, let’s compare them to other approaches for multi-party computation and confidential analytics.

Fully Homomorphic Encryption

Fully homomorphic encryption (FHE) is a cryptographic technique allowing you to perform arbitrary computations (i.e. addition, multiplication) on encrypted data. This also allows for the creation of secure data pools, with the added benefit of requiring no specialized hardware. Although the technology is relatively new, there are already a number of startups (i.e. Enveil, Duality) working to commercialize it.

Although promising, the challenge with fully homomorphic encryption is that it’s too slow. The first FHE cryptosystem was introduced in 2009 and was 9 orders of magnitude slower than normal computation, taking thirty minutes to run a single bit operation. Things have gotten better since, but state-of-the-art systems remain 4-5 orders of magnitude slower than unencrypted computation. This is unacceptable given today’s massive datasets and computationally expensive techniques.

There has been progress in speeding things up with hardware acceleration (GPUs, FPGAs, ASICs). Unlike hardware enclaves, however, such solutions remain far from widespread industry adoption.

Multi-Party Computation Protocols

Multi-party computation protocols (MPC) are a cryptographic technique allowing multiple parties to jointly compute arbitrary functions over their collective inputs while keeping each input private. Like FHE, it requires no specialized hardware. There are a number startups (i.e. Inpher) working to commercialize it.

Although faster than fully homomorphic encryption, the cryptographic overhead and repeated back-and-forth required by MPC makes it too slow for most modern workflows. As a benchmark, consider three parties pooling data to train a simple convolutional neural network. Using a state-of-the-art protocol like Falcon, it would take 526 weeks and 185 terabytes of bandwidth to train VGG16 (a 16-layer convolutional neural network) on the CIFAR-10 dataset. The performance and bandwidth overheads would only increase with the number of parties involved.

Rishabh and Raluca (Opaque’s CEO and Opaque’s President) actually did a significant amount of research on multi-party computation protocols themselves, developing a system called Senate which improved MPC performance on rich SQL analytics by up to 145x. They chose a different technology to build Opaque on for a reason: even with that improvement, Senate could only handle analytics on small databases with tens of thousands of rows; anything in the hundreds of thousands was unacceptably slow.

Hardware Enclave Systems

Hardware enclaves provide the only reasonably performant approach to securing data in use, carrying an overhead of 0.2x to 2x. There are a number of startups (i.e. Fortanix, Anjuna, Cysec SA) looking to commercialize them at scale.

Out of everyone, however, Opaque is best-positioned to lead the multi-party computation and confidential analytics market:

In terms of technology, Opaque offers the most secure and performant solution.
In terms of product, Opaque offers the most frictionless solution.

Fundamentally, Opaque is the only company building specifically for data scientists. While everyone else is focused more on the cloud security applications of hardware enclaves, Opaque alone is laser-focused on the analytical applications — no one but them has built the peripheral infrastructure required to make data pooling truly frictionless. This almost puts them in their own category within the larger confidential computing market.

The Long Term

Let’s take a step back.

Some of the most valuable enterprise software companies in history were built to centralize data analytics. Oracle, Snowflake, and Databricks all got their start by helping sales, product, and marketing teams share data and insights more easily. Sharing data across different teams was what made it truly valuable.

Sharing it across different organizations stands to make it even more so. Without compromising privacy or security, Opaque’s confidential data pools allow organizations to share data and insights frictionlessly, unlocking entirely new dimensions of value. Financial institutions will collaborate on fraud and credit scoring. Healthcare institutions will collaborate on disease prediction and patient profiling. Manufacturing and logistics institutions will collaborate on supply chain tracking and predictive maintenance. Companies in entirely different industries will collaborate in ways we have no way of predicting. In the future, entire industries will have centralized data repositories to power industry-wide, industry-specific applications and analytics. This is a paradigm shift in business intelligence.

What makes things ✨extra interesting✨ for companies in the space, however, are the potential for network effects. Like social networks, more adopters make data pooling platforms exponentially more valuable: new entrants will be strongly biased towards joining the pools their partners are already in, making the leading platform both defensible and extensible. With time, the platform with the most parties, most pools, and most data will surely dominate.

The race is on.

Comparables

To make things concrete for you, we think it might be helpful to consider some other companies that have become leaders in the data analytics space. Each of the following companies achieved a multi-billion dollar valuation by defining new trends in data analysis. If Opaque can become The Multi-Party Computation Company™ like we expect, then they stand achieve a similar outcome given the massive potential of data sharing.

Databricks

Databricks is an American software company that popularized the concept of data lakes - centralized repositories of raw data. They allow you to frictionlessly run Apache Spark jobs at massive scale. Like Opaque, Databricks came out of UC Berkeley’s AMPLab (RISELab’s prior incarnation) and was co-founded by Professor Stoica. They’ve surpassed annualized revenues of 1 billion USD, and were last valued at 38 billion.

Snowflake

Snowflake is an American software company that popularized the concept of data warehouses - centralized repositories of structured data. They allow you to frictionlessly scale relational databases and run SQL queries on them at massive scale. Snowflake generated revenues of around 1.2 billion USD in 2022 and currently boasts a market cap of around 50 billion.

Conclusion

Data has never been more valuable, but that value didn’t come from nowhere. Frictionless sharing across teams unlocked the value of data on an organizational scale. Frictionless sharing across organizations will to do the same on an industry-wide scale. As the strongest multi-party computation company, Opaque will lead that charge.

They’re hiring: Careers - Opaque Systems.

Thanks for reading! In case you missed our previous pieces, check them out here:

And to make sure you don’t miss any future ones, be sure to subscribe here:

Finally, if you’re a founder or investor with a company you think we should cover please reach out to us at ericzhou@stanford.edu and uhanif@stanford.edu - we’d love to hear about it :)

Why You Should Join

Discussion about this post

Ready for more?