Christine Spang: Wow! Hi everybody. Thanks for coming to Empower, our first conference. I'm so excited to see all of you here in the room, today. Today, I'm gonna be telling you a bit about the story how Nylas built a stable and scalable API that is used by dozens of customers to power their applications, today. We've been doing this for just about four years at this point and along the way we've learned a few things here and there that I hope that you'll find useful, for those of you that are building APIs and services to help other people get things done. Here's an overview of what I'm gonna cover. I'm gonna give you a little bit of brief context about what exactly Nylas does, then I'll talk a little bit about the technology that powers it and then the meat we'll be talking about some of the things that we've learned over the last four years, that I hope that you'll all find useful. So what exactly does Nylas do?
Christine Spang: The reason that Nylas exists, is because email contains so much really valuable data and yet it's really hard to work with as developers. Email contains things like photos from your sister, event tickets and receipts, personal communications, meeting invitations and more, but as developers, email is about 40 or 50 years old and computing has changed a lot in the last 50 years. A lot of different functionality has been added on over the decades and that's resulted in a lot of complexity. The latest versions of protocols to interact with email came from the late '90s and early 2000s, which means they don't meet the standards that modern developers expect. Developers expect JSON, they expect UTF-8 text and they expect REST APIs that are consistent and easy to use and contain SDKs that you can use to get started with them really quickly. Email falls far short from meeting this bar and that's why we built Nylas. Instead of having to build integrations with lots of different protocols in order to connect email accounts to applications, you instead can use one single, simple easy to use integration and get access to email inboxes and all the valuable data within them.
Christine Spang: Let's talk a little bit about the technology that powers Nylas. For those of you who are engineers in the room, you might be interested to hear a little bit about what tech we're using. So we're a Python shop and we use a few major libraries that are pretty popular in Python land. We power our API with a REST API library called, Flask. We use a library called Gevent to deal with concurrency and our ORM is a thing called SQLAlchemy and we also use Pytest. To serve the application our front-end load balancer is a proxy called HAProxy and then we serve the application using Nginx and Gunicorn. Our primary data store is MySQL, which we actually manage ourselves with primary replica clusters, that we run on EC2.
Christine Spang: Our application talks to MySQL using a high performance proxy called ProxySQL and then we orchestrate our servers using a config management system called Ansible and then we use Redis for caching and queuing. For those of you who are engineers in the room, when you look at the list, you'd probably think that it's extremely ordinary and that's entirely the point. When we were getting started building Nylas, we decided that we wanted to spend most of our time and energy figuring out whether or not the API was actually something that people would find useful and get value out of and also figuring out how to build a business around it.
Christine Spang: As start-ups, we don't have time to wake up in the middle of the night, because we're using some crazy database that is corrupting data and if you ever read Dan McKinley's fantastic essay, "Choose Boring Technology," you may have seen this quote, it says "Let's say every company gets about three innovation tokens, you can spend these however you want, but the supply is fixed for a long while." So we decided not to spend these on our base technology, but instead to spend them figuring out what would make customers lives easier and make developers happy.
Christine Spang: In the early days we also spent a lot of time thinking about what driving principles should lie behind the decisions were making with how to structure and form the API. And the thing that we kept coming back to was the entire point of building Nylas was that clients and developers should have to build one single integration to power their apps, not seven, which was the status quo. And then that we spend a lot of time making sure that we could abstract across differences between providers and provide one simple, unified email API.
Christine Spang: In the beginning when we were trying to figure out how we should actually build the software behind this, we had a few different things that we thought about. The end goal was to provide a consistent, simple, easy to use, and fast API that developers could use to access e-mail data. And there are a few different choices that we could make that could get us there. The first possible strategy was that we could try to sync as little data as possible and instead, when people make queries to our API, simply proxy them to upstream providers like Google or Microsoft. Or we could do something a bit more aggressive and we could actually sync and mirror the contents of the email inboxes into our data store so that we could serve most queries without ever having to talk to the upstream providers.
Christine Spang: And so we were thinking about these different strategies and we went back to what our end-goals were. And that was to provide a fast and simple and easy to use API that people could use instead of talking to upstream providers. And this meant that we had to choose the second strategy. Proxying, while cheap, has a number of different disadvantages. One, it adds a lot of latency to each API request because you have to open another request within the request to talk to your upstream provider. Plus, we didn't wanna be limited to the constraints of what things were easy to query with the existing APIs. Which meant that we didn't wanna have to compose different queries to the upstream providers and pull data together in order to serve simple API queries.
Christine Spang: We also needed our API to be really reliable so that folks could use it instead of upstream providers and if you're proxying, that means that your reliability is the product actually of your software's uptime, plus the reliability of the network link that you're using, plus the reliability of the upstream email providers. And those providers do have outages. We've seen Microsoft outages over the last few years, we've seen Google outages, and we didn't think that met our bar for quality. So we had to go with the second option and figure out how to make the business work around that even though it costs more.
Christine Spang: Here's a rough outline of what our application architecture looks like. It's actually pretty simple. We essentially have two major fleets of servers that do most of the work and one primary data store which is MySQL. We have a fleet of servers that runs and serves our API so it's serving hundreds of requests per second as clients ask for data and ask to move it and change it. And then we have a fleet of servers that actually runs behind the scenes and is continuously syncing email inboxes and keeping our data store up to date. We also have a front-end proxy that load balances the API and we use Redis for doing some caching and queuing behind the scenes. This architecture is really not very complicated and there's a reason for that as well. The reason for that is really that we wanted to spend as much time as possible figuring out the product and the business. Making the application architecture more complicated before you need to means that you'll just spend more time figuring out how to operate it.
Christine Spang: So lets dive more into the lessons that we've learned from building this product. The first one, which I've already mentioned is to, and I'm repeating it here because I think it's really important, is to use boring technology and to not get hung up on using the most cutting edge thing. The importance of using stable building blocks really varies depending on which part of your application that you're building but things like databases for example, it's really important to use something that's battle tested. The second part of this is to not use microservices until you need them and if you have one engineering team that's building one product you probably don't need them yet. If you start using microservices before you need them, you'll have to invest a ton in operational infrastructure in order to be able to maintain these well and it also makes debugging more complicated. We've synced and served over ten billion emails and we still haven't broken our service down into a complete microservice architecture. It's on the horizon and we know it'll happen eventually but no need to over-engineer things before it's needed.
Christine Spang: The third thing I wanna talk about is how open source is an important part of building developer trust. When we launched the Nylas API the first thing we actually launched was not a running service that you could use to power your application. It was a code repository that we released on GitHub and we were actually really terrified to release our code publicly on GitHub because we were afraid that people might look at it and criticize it or steal it and copy us. But it turned out that looking back in retrospect, this was a really powerful decision that really impacted our growth and helped us get off the ground.
Christine Spang: A lot of folks don't release any server code for their products. They'll release client SDKs that folks need to build into their application to use them. But when it comes to understanding how the backend works, developers are essentially left in the dark. And in the beginning, when you're trying to get people to use your products to build their applications, there's a lot of trust that you have to build. And one way to build developer-trust is to show them the source code. People are taking a big risk by building their applications on top of a new platform, and they know that. So you need to work extra hard to get over that hump, especially in the beginning. By putting our source code out there and saying, "Well, you know if we go away as a company, you folks can take our code and continue to maintain it and run it yourselves," we got over that initial hump of building developer trust. And that was really important for getting us to the place that we are today.
Christine Spang: The fourth thing I want to talk about, is that it's really important to be consistent in your API. And what I mean by this is a couple of things. One, if you have many different APIs that serve different data, you should use the same structure for all those different APIs. One of the advantages of Nylas over, for example, upstream provider's APIs, is that you can access email, and contacts, and calendar data, all with the same simple structure, because it was built and designed by the same team. Consistency makes it much easier to get started, and to be able to use different parts of an API while having learned only one part of the API, because you can expect that the other parts will work in the same way, as the parts that you've already worked with. Which makes the learning barrier a lot lower.
Christine Spang: And it's also important not to break your API. And what I mean by this is, when you release new features or release bug fixes, folks have taken a big risk building their application on top of your platform. And they need to trust you, to not break their application when you're changing your API platform. There's a number of different things that we've kind of figured out over the years, about how to do this. And one is for example, when you're building a new API, it's really important to move fast and iterate. That's how you do good, agile development practices. But that's at odds with this idea, that when you release something and people start building on top of it, you can't really change it. So, we worked around this by doing things like writing the API docs and spec first. And sharing those with the interested customers or people that've opted into beta of a certain feature, in order to get feedback really rapidly, even before we started building any code.
Christine Spang: Then when we actually build the first version of a new feature or an API, we will often not publicly document it, until we've validated that feature with some actual customers who are gonna be building on top of it and using it. And these two simple strategies have allowed us to reduce the number of things that are weird, or inconsistent, or kinda broken that need to be fixed later. There's also strategies that you can use to deal with inconsistent changes that you need to do, for example, Stripe uses this strategy called gating, in which you actually pin clients to certain versions under your API, and as you introduce breaking changes you allow them to upgrade versions but leave in the code the old logic so that you can still serve the version that old clients expect. And this is a totally valid and useful strategy, but it's also important to minimize doing that in the first place, because maintaining all of these different code paths to serve different versions of your API, adds up to more maintenance burden over time.
Christine Spang: Another thing that we've found is really useful is that documentation is great, but documentation that helps you build your app is even better. Developers love being hands-on, and being able to run things really quickly. And if your developer documentation does things like gives you code snippets that you can run, generates the right cURL request that you can copy and paste into your terminal, and run and see it in action, gives people a much more tangible understanding of your API and how it works. And also helps developers get off the ground much more quickly. It's easier to go from a code snippet that basically does the gist of the right thing that you want to do, and modify that to plug it into your app, than it is to start from scratch, starting just from documentation. There's even SaaS services out there that can help you build these kind of docs really easily. We use a product called ReadMe and it's been really helpful for us in terms of providing better documentation for developers.
Christine Spang: Number seven, have a Status Page. Now we're gonna talk a bit about reliability which is another really important piece of having developers and companies trust you. If your app is down, some part of their app is down and it's impossible to have an API that's never down. But it is possible to have very good uptime. It's also really important that when you are having issues that you communicate them really well to people that depend on your service. We use a SaaS service called statuspage.io for our status page. So you don't even have to do a whole ton of work to get a status page for your service. There're also folks out there who build their own bespoke versions, but you don't have to and a lot of us don't have the time, so we can just use a service that gives us a status page out of the box.
Christine Spang: Number eight, availability in uptime is great but it's only the foundation of reliability and trust. If your API is up all of the time but your customers are running into issues where there're like bugs and it's not doing what they want, then essentially to them your product is down. So it's really important to think of reliability holistically both in terms of whether the service is up and responding to requests, like what the success rate is of requests that you're serving, how many 500s are you serving. How many errors are you recording on the application and also to be able to think about reliability as actually powering people's applications well. To do this, you need instrumentation and monitoring. When you're a platform, you need to be able to drill in to the metrics behind your API and visualize what exactly a customer is actually experiencing from the point of view of their application.
Christine Spang: We use a few different tools to do this and the most important one is this thing called Honeycomb. If you have ever worked at Facebook or have read a Facebook paper called the Scuba paper, this kind of tool might be familiar to you. And what Honeycomb allows us to do is we basically pipe a feed of logs and events to it and this includes the logs of what requests we're serving and we can tag these API requests with all sorts of different information. The whole point of this tool is that you can add arbitrary number of fields and then you can query and analyze based on those fields. So one thing that we ship to Honeycomb is for example the ID of which application made a request. So when we're investigating issues for a particular customer, we can go to Honeycomb and pull up exactly which requests they've been using, see the latency of those requests, see any errors and see what it looks like from their perspective so that we can make sure that they are being successful.
Christine Spang: We also use more traditional monitoring metrics tools, this is a screenshot from one of our dashboards just for checking out system status. And these are also an important part of instrumenting your API which a lot of folks may be more familiar with because you need these even if you're not running a platform. But when you're investigating issues, you may start with something like Honeycomb where you can dive in and figure out where the problem is coming from and you might need to fall back on things like traditional metrics in order to to debug systems issues.
Christine Spang: The last thing I wanted to talk about is really the human element. At the end of the day, we're all building products and services that are about helping other people get something done better faster and with fewer resources than they have. Together, we can help people build things that they wouldn't be able to build before and that's what motivates me at the end of the day. None of the things that we're building are necessarily rocket science but no one has enough people and time to build everything themselves. And that's why SaaS is taking off, that's why it's important and that's why we're having this conference. So thanks to you all for coming.
Moderator: Thank you again, that was awesome. Does anyone have any follow-up question for her? And we have to mic runners, we'll bring the mics back to you.
Audience Question 1: Hi, so we're a small SaaS company and we've just released the first version of our API and I noticed that you were talking about keeping an API stable and not breaking an API. And we're getting a lot of requests from customers to add new features to the API. And we're wondering how do you balance the need to add new features and grow the ability of the API really quickly without breaking things? And I feel like if we made all the changes that people ask for, there'll need to be a lot of change in future versions.
Christine Spang: Yeah, there's a couple of things I would think about here, one is, talk to a lot of your customers and validate those ideas on other customers because if you're building a platform, it's really important to make sure that things that you're committing to support are things that you're not gonna be building just for one single customer. Sometimes, that can be appropriate if it's a really important customer but in general, you should be managing your roadmap by having regular customer conversations, taking ideas, validating them. And then I would go back to the things that I actually mentioned in the talk which is, for changes and new features that you are going to build, write the API spec, run it by your customer and then don't publicly document until you're sure that it works for them at least. And then in the long run, you'll wanna look into something like having some sort of versioning on your API. It's impossible to predict the future but you can do some upfront investment that will reduce the risk and I think it really, really comes down to communicating a lot with the people that are gonna be using your product. Any other questions?
Audience Question 2: Sure. So my question is a bit related but it's more about the aspect of releasing your source code on GitHub. And basically balancing your willingness to open your code versus now you have also to maintain it for all the developer communities, who're gonna come back to you with questions and how that work. How can we improve this one? Do you invest much of your time in maintaining that or once you just release it, it's a done deal and it's out of the door?
Christine Spang: Yeah, for sure. For our server side code, I have to say that we've gotten a lot of value out of just releasing it and haven't spend much effort in trying to build a community around it which is... There are companies out there that do invest a lot of time and energy in doing that but for our server side code it's actually really tricky to do that because for example, and in the early days we had things like people are being like, "Well, I wanna run this using Postgres," and it's a completely not a good use of our time to modify our code to support multiple databases. And so we pretty much decided that, for stuff like that we would tell them to fork it and maintain that on their own.
Christine Spang: So I think the main uses that we've seen... There are some people that do take our code and run it themselves. But we also have customers that will, because email is a very heterogeneous ecosystem and they're kind of weird edge cases and things that can go wrong, some customers of ours actually use our source code when our dashboard is showing them that some account is having a sync error and they're trying to figure out what that means. So, I think for a lot of folks, spending a ton of time building a community is not necessarily a great use of their time. Any other questions? Yeah.
Audience Question 3: Yes, so you mentioned that you sync emails from all the providers to your servers, do you also do that with calendars? 'Cause I know a lot of calendar APIs don't give you WebHooks when new events get added, so I'm curious, how you manage that.
Christine Spang: Yeah, we do the same thing for contacts and calendars and it's actually a lot simpler for contacts and calendars because it's a lot, it's like an order of magnitude less data. So it's less expensive for us to run and just easier to manage but there wasn't really a reason for us to consider a different strategy for that especially since it's an easier problem overall and we were already... We solved the email problem first and it's like, "Well, we'll just do the same thing for calendars."
Christine Spang: Great! Well, thanks again for listening to my talk and for showing up.