Distributed Systems

Latency by Pekka Enberg

While many authors have written about database tuning, systems tuning, or code optimization, I haven't seen any come together to cover the whole stack in such a comprehensive way, targeting both software engineers and database architects.

Valerie Parham-Thompson

Having a shared vocabulary across database, software, and infrastructure teams is critical when working together to tune latency issues. I’ve been in many incident rooms where the only report is “the application is slow” and had to unwind a series of questions: What do you mean by slow? Where do you see this? What parts are slow? If everyone in the room had read Enberg’s Latency, solving these kinds of incidents would be much faster.

Query Optimization with HypoPG

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson

Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.

Understanding HypoPG

HypoPG is a PostgreSQL extension that allows you to create hypothetical indexes and see how they would affect your query plans without actually creating the indexes. This is particularly useful when:

Finding the Right Yugabyte Api Endpoint

Finding the Right Yugabyte Api Endpoint

Tour through the YugabyteDB YBA API endpoints with a real-world example

Valerie Parham-Thompson

As YugabyteDB continues to evolve, its extensive API ecosystem offers powerful capabilities for database management and automation. However, with hundreds of API endpoints across overlapping categories, locating exactly the right API endpoint can be challenging. In this guide, I’ll walk you through several proven strategies for efficiently finding the API endpoints you need, along with real-world examples and pro tips I’ve learned from working with YugabyteDB’s API ecosystem.

Method 1: Navigating Categories in the API Documentation

The API documentation (api-docs.yugabyte.com) provides a well-organized categorical view of available endpoints. Understanding how to navigate these categories effectively will significantly speed up your API discovery process:

College Scorecard API

College Scorecard API

Mapping College Scorecard data using the API

Valerie Parham-Thompson

After I finished the YugabyteDB universe network mapping example, I started thinking about other things to map. Anything with latitude and longitude will work. College locations from my previous work on the College Scorecard data set were an obvious choice.

Previously, I had exported the data and transformed it to allow for sorting and analysis. That’s still a valid method if you want to play with the pull data set, since the API allows only page size of max 100 at a time. However, with the right filters, that might be enough, and the API is a quicker path to getting the data.

Plotly Network Map

Plotly Network Map

Using the Plotly library to work with geographic data

Valerie Parham-Thompson

I’ve added a new feature to the day 2 ops tool.

With the diagram command, you can create a map of your Yugabyte cluster overlaid on a map of the world. Here’s an example:

Yugabyte Network Map

The Plotly library is very powerful, with a lot of options. I used the network map option, which allows you to define nodes and the edges between the nodes. In this case, the nodes are an abstraction of the database instances in a YugabyteDB cluster, and the edges represent the network connections between them.

Count Large Partitions in YCQL

Count Large Partitions in YCQL

Counting large partitions in the YugabyteDB Cassandra API

Valerie Parham-Thompson

One thing that can really wreck your performance in Cassandra and the similar YugabyteDB YCQL is large partitions due to an imbalanced key. Without the robust nodetool commands of Cassandra, it can be challenging to find these large partitions in YugabyteDB.

dsbulk is a tool used for migrating data, and YugabyteDB has a fork that takes into consideration slight differences from Cassandra. That tool can be leveraged to list the top largest partitions.

Optimizing Read and Write Latency

Optimizing Read and Write Latency

Reducing latency in reads and writes in YugabyteDB

Valerie Parham-Thompson

Today’s global and distributed applications often need to serve user requests from a single data source across different regions. While providing data scaling and protection against network outages, ensuring low-latency access to data is critical for providing a seamless user experience. YugabyteDB, a distributed SQL database, is designed to handle global data workloads efficiently. In this blog post, I’ll share some techniques to optimize read and write latency in a multi-region YugabyteDB cluster.

Tablet Sizing Strategies

Tablet Sizing Strategies

Valerie Parham-Thompson

Modern distributed databases split large tables into tablets to enable parallel processing and efficient data distribution. Finding the right tablet size impacts everything from query performance to operational overhead. Let’s explore how to approach tablet sizing systematically to achieve optimal performance.

Understanding Tablet Impact

Each tablet in your distributed database represents an independent unit of data distribution. When you create tablets, you influence system behavior at multiple levels. The database uses tablets to parallelize operations, manage resources, and handle data growth. Your tablet strategy directly affects query response times, write throughput, and overall system health.

Open Source Database

Open Source Database

What does it mean to be a database engineer of multiple open-source databases?

Valerie Parham-Thompson

I’m an open-source database consultant. But which open-source database? Well, several of them.

I made the decision several years ago to take every opportunity to work with multiple databases. Why?

  1. Learning a new language teaches you more about your own. For example, taking time to understand sstables in Cassandra gave me more insight into how storage works in MySQL. Having these experiences across multiple databases forced me to question what I knew about internals, therefore deepening my understanding overall.

Database Scaling for Seasonal Increases

Database Scaling for Seasonal Increases

Understanding seasonal patterns when planning for database scaling

Valerie Parham-Thompson

Timing of seasonal demand depends on the industry, but a cyclical increase in traffic applies to all industries. Maybe your cycle is shorter than a full year. Or maybe it’s related to things like weather patterns or fashion.

You know when your business gets the most traffic.

  • Retail? Black Friday/Cyber Monday.
  • Health and fitness? New Year’s Day.
  • Pizza restaurant? Halloween and Thanksgiving, surprisingly.

You do know your business well. But do you know how to make sure your database infrastructure can keep up with traffic during these busy periods?