Rdbms

Correct Partition Endpoints

Valerie Parham-Thompson

I was recently reviewing a database partitioning definition in YugabyteDB (the postgres “ysql” API), and realized the partition distribution might not be what the developer intended.

What is database partitioning?

Database partitioning is used to divide large tables into smaller tables (partitions). While the data is physically separate, the application can access the data logically as a single table.

This can help performance through a process called partition pruning. The database planner skips partitions that don’t hold the data. For example, if a table is partitioned on months of the year, a query on a single month only has to access the rows in the single partition for that month.

Why You Need a Default Partition

Valerie Parham-Thompson

Postgres and YugabyteDB allow you to define partitions of parent tables. Partitions are useful in at least two ways:

  1. You can take advantage of partition pruning. The database doesn’t need to look at partitions it knows won’t meet the parameters of the query.
  2. You can easily archive data by disconnecting and/or dropping partitions instead of managing expensive delete queries.

Here’s one gotcha I ran into recently. What happens if you insert a row into a partitioned table, but there’s no partition for it? The insert fails with an error – see below for a reproduction of this scenario.

Generate Random Data

Valerie Parham-Thompson

I had to create a 10 million row table for testing recently, and put together a query to generate random data for it.

INSERT INTO my_table
(id,
mydatetime,
string1,
string2)

SELECT
(random() * 70 + 10)::int,
TIMESTAMP '2024-01-01 00:00:00.000000' + interval '1 millisecond' * (random() * 86400 * 1000 * 365),
(array['alligator','bear','cat','dog'])[(random() * 3 + 1)::int],
substr(md5(random()::text), 1, 10)

FROM generate_series(1, 10);

The id field is just a random integer in this example, but you’d probably use an identity column.

Open Source Database

Valerie Parham-Thompson

I’m an open-source database consultant. But which open-source database? Well, several of them.

I made the decision several years ago to take every opportunity to work with multiple databases. Why?

  1. Learning a new language teaches you more about your own. For example, taking time to understand sstables in Cassandra gave me more insight into how storage works in MySQL. Having these experiences across multiple databases forced me to question what I knew about internals, therefore deepening my understanding overall.

Timestamps Postgres Migration

Valerie Parham-Thompson

Math… the universal language. Timestamps, not so much.

The way we decide to denote date and time differs across both computer languages and human languages. The format also differs across implementations of SQL. For example, Oracle and Postgres allow very different formats to be entered in the timestamp data type.

Oracle allows a wide variety of punctuation in dates: hyphens, slashes, commas, periods, colons. Postgres supports a more limited list.