Distributed-Systems

Plotly Network Map

Valerie Parham-Thompson

I’ve added a new feature to the day 2 ops tool.

With the diagram command, you can create a map of your Yugabyte cluster overlaid on a map of the world. Here’s an example:

Yugabyte Network Map

The Plotly library is very powerful, with a lot of options. I used the network map option, which allows you to define nodes and the edges between the nodes. In this case, the nodes are an abstraction of the database instances in a YugabyteDB cluster, and the edges represent the network connections between them.

Optimizing Read and Write Latency

Valerie Parham-Thompson

Today’s global and distributed applications often need to serve user requests from a single data source across different regions. While providing data scaling and protection against network outages, ensuring low-latency access to data is critical for providing a seamless user experience. YugabyteDB, a distributed SQL database, is designed to handle global data workloads efficiently. In this blog post, I’ll share some techniques to optimize read and write latency in a multi-region YugabyteDB cluster.

Leveraging time to live (TTL)

Valerie Parham-Thompson

In both MySQL and Postgres, expiring records after a set period of time takes a couple of timestamps and a little creativity. With Cassandra, or in this case the YugabyteDB ycql API, TTL (time to live) can be leveraged to handle this functionality, simplifying both the table definition and amount of work required by your code.

Here’s a short test to demonstrate. Reminder that you can set up a quick 3-node cluster using the code here: https://github.com/dataindataout/xtest_ansible.

Processing Data with Pandas

Valerie Parham-Thompson

I’ve been experimenting with processing data with Pandas this week, specifically historical NOAA weather data, and storing it in a local YugabyteDB cluster. This open data set contains max/min/precipitation for years back to 1750 (not all data points are available for all years or locations). It’s available here: https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html

I leveraged my existing demo framework to provision a local YugabyteDB cluster, and then used Pandas to import data from txt and csv files. The txt lookup files were countries, states, stations, and inventory. The csv files were available in different formats. The code I’ve linked below imports all weather data for a single year.

Foreign Data Wrappers

Valerie Parham-Thompson

I was recently setting up a demo to show off query logging features. Two common extensions, pg_stat_statements and pg_stat_monitor, store data locally. In the case of a distributed database, it is helpful to combine the query runtimes on all nodes.

YugabyteDB supports foreign data wrappers, so I decided to use this feature to combine query statistics from each of my three test nodes.

The libraries for the pg_stat_monitor extension are already installed, so the extension just needs to be created:

YugabyteDB Snapshots

Valerie Parham-Thompson

A distributed database is designed to withstand outages to a good degree. However, you should also maintain backups in case of “oops” scenarios like a dropped table.

The yb-admin tool can be used to manage snapshots. Here’s a brief walkthrough.

Some caveats about using snapshots… They are stored on the same server, so this method doesn’t protect against file system corruption. Also, this doesn’t snapshot the schema, just data.

If you don’t already have a test environment, check out a quick test setup here https://github.com/dataindataout/xtest_ansible.

String Search

Valerie Parham-Thompson

Quick post to share my presentation last week at the YugabyteDB Friday Tech Talk. It was on fuzzy matching, and more generally string searches. Got to nerd out on two of my favorite topics: words (broadly, linguistics and specifically, names) and databases. Check it out!

(Code for scenarios in my repo, here: https://github.com/dataindataout/xtest_ansible/tree/main/scenarios/fuzzy)

https://www.youtube.com/watch?v=vmHRnR1nFdQ

Audit Logging

Valerie Parham-Thompson

Gearing up for my next YFTT presentation next month. It will be on fuzzy matching, a chance to show out some neat string search features.

Meanwhile, here’s the deck for my last YFTT. The topic was audit logging.

https://info.yugabyte.com/hubfs/YFTT%20Slide%20Decks/2022_12_02_YFTT_Valerie%20Parham-Thompson_Audit%20Logging%20in%20YugabyteDB.pdf

Audit logging is just one of the security features available in YugabyteDB. You can use it to tell you the “who, what, when, where” of actions on your systems. The logs can be then sent to a log analysis system for archiving and correlation with other logs.

Replication scenarios

Valerie Parham-Thompson

I recently put together a platform to demo a handful of scenarios related to YugabyteDB cross-cluster replication.

The code is here: https://github.com/dataindataout/xtest_ansible

This works for Mac (Apple M1) and should work on later versions of Mac and Linux. Unsure if it will work on Windows.

You will need a copy of YugabyteDB (2.16 or 2.17, depending on which branch of the demo code you use). Note that xcluster functionality improves greatly at 2.17, so test at that version or beyond if you can.

Development Environment for YugabyteDB on Mac M1

Valerie Parham-Thompson

Here’s a very quick way to set up YugabyteDB on your Mac for functional testing. It assumes you already have Homebrew installed.

brew tap yugabyte/yugabytedb
brew install yugabytedb

In the future, you can upgrade the version by running this:

brew upgrade yugabytedb

Verify the installation and check the version:

yugabyted version

Set up local networking:

sudo ifconfig lo0 alias 127.0.0.2
sudo ifconfig lo0 alias 127.0.0.3

Then you can set up a three-node YugabyteDB cluster. Change the data directory if you’d like.