Distributed Systems

Architecting the Checkout Lane

Valerie Parham-Thompson

Technical executives in retail face a persistent challenge: balancing loss prevention while reducing friction in the customer experience. Recent implementations of computer vision systems at Sam’s Club demonstrate the potential for technology to meet both goals. At Groceryshop 2024, the CEO of Sam’s Club described their new exit verification system and reported a 14% improvement in the Net Promoter Score—a metric that validates the investment in advanced technological solutions.

Computer Vision

My local Sam’s Club has two arches that you push the buggy through after checking out (on the app or at the registers). About 25 feet past the arches is an employee who will wave you through or check your receipt. Most of the time these days, I’m waved through. Combine this with the scan-to-go app, and I can be in and out of Sam’s Club in about 15 minutes. Compare this to the experience at Costco, which is more like a social event, because I will have enough time to make friends in line!

The Milk Problem: Understanding Modern Retail Inventory Management

Valerie Parham-Thompson

Have you ever ordered milk online only to receive that dreaded out-of-stock notification after placing your order? You’re not alone. After years of ordering groceries online, I’ve learned a secret: ordering through a store’s app right after closing time often yields the best results. Why? I suspect it’s because my order gets fulfilled before in-store shoppers arrive the next morning. But this workaround hints at a deeper problem in retail inventory management—one that affects nearly half of consumers who rate out-of-stock items as their highest shopping frustration according to a survey by Dynata.

Handling Reserved Keywords in DSBulk for Seamless Data Migration

Valerie Parham-Thompson

Migrating to YugabyteDB offers significant advantages in terms of high availability, global distribution, and horizontal scalability—features essential for managing modern database workloads. However, data migration can be a complex process, particularly when transforming your schema definition. Differences in datatype support, query syntax, and core features across systems can complicate the transformation.

One of the challenges is dealing with reserved keywords in the source schema that cannot be directly used in the target system. This can require changes not only in the database schema during transformation but also in application code and related tooling.

Plotly Network Map

Valerie Parham-Thompson

I’ve added a new feature to the day 2 ops tool.

With the diagram command, you can create a map of your Yugabyte cluster overlaid on a map of the world. Here’s an example:

Yugabyte Network Map

The Plotly library is very powerful, with a lot of options. I used the network map option, which allows you to define nodes and the edges between the nodes. In this case, the nodes are an abstraction of the database instances in a YugabyteDB cluster, and the edges represent the network connections between them.

Count Large Partitions in YCQL

Valerie Parham-Thompson

One thing that can really wreck your performance in Cassandra and the similar YugabyteDB YCQL is large partitions due to an imbalanced key. Without the robust nodetool commands of Cassandra, it can be challenging to find these large partitions in YugabyteDB.

dsbulk is a tool used for migrating data, and YugabyteDB has a fork that takes into consideration slight differences from Cassandra. That tool can be leveraged to list the top largest partitions.

Optimizing Read and Write Latency

Valerie Parham-Thompson

Today’s global and distributed applications often need to serve user requests from a single data source across different regions. While providing data scaling and protection against network outages, ensuring low-latency access to data is critical for providing a seamless user experience. YugabyteDB, a distributed SQL database, is designed to handle global data workloads efficiently. In this blog post, I’ll share some techniques to optimize read and write latency in a multi-region YugabyteDB cluster.

Open Source Database

Valerie Parham-Thompson

I’m an open-source database consultant. But which open-source database? Well, several of them.

I made the decision several years ago to take every opportunity to work with multiple databases. Why?

  1. Learning a new language teaches you more about your own. For example, taking time to understand sstables in Cassandra gave me more insight into how storage works in MySQL. Having these experiences across multiple databases forced me to question what I knew about internals, therefore deepening my understanding overall.