Query Optimization

How A* Improves on Dijkstra (and When It Doesn't)

How A* Improves on Dijkstra (and When It Doesn't)

Valerie Parham-Thompson

In my previous post on routing, I used Dijkstra’s algorithm without much discussion of alternatives. The Dijkstra algorithm works for network routing, and for many problems it is the right choice. But pgRouting also ships with pgr_aStar, an implementation of the A* algorithm that can find the same shortest path while exploring fewer edges. The difference comes down to one thing: a heuristic that tells the algorithm which direction to look.

Routing APIs Compared: Choosing the Right Shortest-Path Service

Routing APIs Compared: Choosing the Right Shortest-Path Service

Valerie Parham-Thompson

In my previous post on pgRouting, I showed how to run shortest-path queries directly inside PostgreSQL. That approach works well when your road data is already in Postgres and your network is moderate-sized. But what happens when you need live traffic data, global coverage, or routing at thousands of queries per second? That is where external routing APIs and dedicated routing engines come in.

I have evaluated five routing services that cover the spectrum from fully managed APIs to self-hosted open-source engines. Each one makes different trade-offs around cost, speed, customization, and data freshness. Here is what I found.

Getting Started with pgRouting: Shortest Paths in PostgreSQL

Getting Started with pgRouting: Shortest Paths in PostgreSQL

Valerie Parham-Thompson

If your application needs to answer “what is the fastest route between two points,” you might reach for an external routing API like Mapbox Directions. But if your spatial data is already stored in PostgreSQL, the Postgres extension pgRouting lets you run graph-based routing queries right where the data is.

PostGIS gives you spatial data types, indexes, and operations like distance calculations, intersections, and buffers, but it has no concept of navigating a network from one point to another. pgRouting fills that gap by adding graph traversal and shortest-path algorithms on top of PostGIS. I recommend it for teams that already use PostGIS and want to keep routing logic close to the data.

Using pg_stat_statements for Query profiling and performance tuning

pg_stat_statements is an extension that tracks execution statistics for every normalized SQL statement.

Valerie Parham-Thompson

Database performance problems are often mysterious. Queries slow down, CPU usage spikes, or users complain about latency, but pinpointing the cause requires visibility into what your database is actually doing. pg_stat_statements is PostgreSQL’s answer to this challenge.

pg_stat_statements is an extension that tracks execution statistics for every normalized (fingerprinted) SQL statement. Instead of logging millions of nearly-identical queries, it groups similar statements together (with constants replaced by placeholders), aggregating their execution metrics into a single fingerprint. This approach provides comprehensive query-level insights with minimal performance overhead and storage cost.

Query Optimization with HypoPG

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson

Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.

Understanding HypoPG

HypoPG is a PostgreSQL extension that allows you to create hypothetical indexes and see how they would affect your query plans without actually creating the indexes. This is particularly useful when:

Correct Partition Endpoints

Correct Partition Endpoints

Using the correct endpoints in YugabyteDB database partitioning

Valerie Parham-Thompson

I was recently reviewing a database partitioning definition in YugabyteDB (the postgres “ysql” API), and realized the partition distribution might not be what the developer intended.

What is database partitioning?

Database partitioning is used to divide large tables into smaller tables (partitions). While the data is physically separate, the application can access the data logically as a single table.

This can help performance through a process called partition pruning. The database planner skips partitions that don’t hold the data. For example, if a table is partitioned on months of the year, a query on a single month only has to access the rows in the single partition for that month.

Why You Need a Default Partition

Why You Need a Default Partition

Required default partitions to avoid lost data in Postgres and YugabyteDB

Valerie Parham-Thompson

Postgres and YugabyteDB allow you to define partitions of parent tables. Partitions are useful in at least two ways:

  1. You can take advantage of partition pruning. The database doesn’t need to look at partitions it knows won’t meet the parameters of the query.
  2. You can easily archive data by disconnecting and/or dropping partitions instead of managing expensive delete queries.

Here’s one gotcha I ran into recently. What happens if you insert a row into a partitioned table, but there’s no partition for it? The insert fails with an error – see below for a reproduction of this scenario.

Open Data College Scorecard

Open Data College Scorecard

Exploring the College Scorecard open data set from the Department of Education

Valerie Parham-Thompson

I’ve been pretty quiet recently, I know. My youngest has been going through the college application phase, which has taken a lot of time for both of us. I wouldn’t give up all the lovely college visits and overnights, but I might easily part ways with the paperwork.

I do have something fun to share from the experience. I didn’t like the limitations of common college search websites. In particular, we were looking for a college in a subset of surrounding states. Most college search forms allow you to enter only a region, and the Southeastern region was too broad for her search. I also didn’t like that signing up for the sites subjected you to a lot of marketing.

Foreign Data Wrappers

Foreign Data Wrappers

Using foreign data wrappers to combine metrics across a distributed database cluster

Valerie Parham-Thompson

I was recently setting up a demo to show off query logging features. Two common extensions, pg_stat_statements and pg_stat_monitor, store data locally. In the case of a distributed database, it is helpful to combine the query runtimes on all nodes.

YugabyteDB supports foreign data wrappers, so I decided to use this feature to combine query statistics from each of my three test nodes.

The libraries for the pg_stat_monitor extension are already installed, so the extension just needs to be created:

String Search

String Search

Link to YFTT fuzzy matching for string searches

Valerie Parham-Thompson

Quick post to share my presentation last week at the YugabyteDB Friday Tech Talk. It was on fuzzy matching, and more generally string searches. Got to nerd out on two of my favorite topics: words (broadly, linguistics and specifically, names) and databases. Check it out!

(Code for scenarios in my repo, here: https://github.com/dataindataout/xtest_ansible/tree/main/scenarios/fuzzy)

https://www.youtube.com/watch?v=vmHRnR1nFdQ