Jeremiah Peschka (@peschkaj) helps developers, DBAs, and engineers build fast, robust, scalable solutions. He’s this week’s guest on the return of Voices. He’s Managing Director at Brent Ozar Unlimited and has been a sysadmin, developer and DBA. That background makes him a perfect candidate to talk about how services like Orchestrate fit into today’s applications.
We hear a lot about polyglot persistence. What are the primary benefits of polyglot persistence, and what are the biggest risks?
Jeremiah PeschkaFor those who don’t know, “polyglot persistence” is storing data in more than one format. There are a few benefits to polyglot persistence, but I think the biggest benefit is that developers don’t have to settle for the feature set of a single database engine.
I’ve witnessed a number of scenarios where developers have worked around the limitations of a database engine only to paint themselves into a corner in some other way. With polyglot persistence, it’s possible to get the best of multiple worlds. Whether that’s support for XML, geospatial data, or rich querying semantics, polyglot persistence gives developers the ability to pick and choose a data storage tool based on the requirements of querying rather than the requirements of which licenses have the longest running software agreement.
There are two big downfalls, that I see, to polyglot persistence – data synchronization and querying complexity.
Keeping data in sync is difficult enough when there are multiple database servers that are the same – e.g. two MySQL systems. When there are multiple systems involved, not only can data get out of sync, but it can become difficult to figure out which data is not kept in sync. Teams attempting to solve this problem frequently build systems using complex systems that strive to push data through in a controlled fashion, but errors still happen.
Querying is the lifeblood of an application – if it’s tough to query the data store, developers can add friction to the development process. Polyglot persistence can require that developers query across multiple different data sources and aggregate the data together in the application tier. More than once, I’ve heard developers jokingly say “we can just join the data in the application.”
What advice would you give to enterprise organizations looking to adopt NoSQL databases? How can they complement existing SQL databases in production?
I encourage enterprises looking to adopt NoSQL database to try out pilot projects that enhance existing application functionality. Different non-relational databases excel at different things, so it’s important to match the database with the problem that is being solved.
Implementing a new application is too high risk for testing out a new database platform. A safer approach is to enhance an existing application either by adding new functionality or by making something faster. I find that taking a low risk to augment an existing application is helpful.
For example, in an application that stores complex relationships like a complete bill of materials or a complex order, it’s possible to enhance the application by saving the entire complex relationship in a key/value store. This can improved perceived read performance – reading a complete order is just a single primary key lookup.
When does it make the most sense to run your own databases? When does it make the most sense to use a service like Orchestrate?
It makes the most sense for companies to run their own databases when they need exacting control of the service. This could be because of regulatory compliance reasons (although many hosted services can meet those requirements). Other organizations may opt to host the application for some kind of SLA (Service Level Agreement).
When service agreements are involved, teams want to have fine grained control over most aspects of the environment. It’s not enough to know that failover will be “quick” and downtime will be “infrequent” – some teams may need to engineer failovers so that some aspects of service can be automatically restored in a matter of minutes. Strict performance requirements may also necessitate running your own database. This is typically the realm of a database tuning specialist.
It makes sense to use hosted services like Orchestrate under a different set of conditions. Those conditions aren’t “when you don’t care about performance” or “when you don’t have an SLA” but rather when you want to outsource the management of SLAs and performance.
Moving to a hosted service like Orchestrate abstracts away the implementation details of the storage engine. It doesn’t matter to the developer how the underlying database is implemented, so long as SLAs are met. This provides huge benefits to people consuming the service – it’s only functionality that’s important, as long as the database can deliver the functionality, developers can keep writing code against the API.
How is the role of the DBA changing?
I don’t see the DBA’s role changing so much as I see it expanding. DBAs traditionally have focused only on relational databases. With a variety of databases available, there’s plenty of work for both operational (lights on) DBAs and developer (tuning) DBAs. Both types of DBAs can expand their focus to meeting availability, recovery, and performance across multiple platforms. Many DBAs already possess the core skills, they just need to expand across new platforms.
Even with hosted services, the DBA still has a role to play. DBAs still have to focus on making sure SLAs are being met. If SLAs aren’t being met the DBA is a unique position to identify a root cause and work with application developers or platform vendors to resolve the issue. Outside of monitoring performance, the DBA still has a role ensuring the data can be recovered, transferred to another platform, and exported into another form for reporting and analytics.
Just because things are changing, it doesn’t mean that DBAs aren’t needed.