Posts

How A* Improves on Dijkstra (and When It Doesn't)

How A* Improves on Dijkstra (and When It Doesn't)

Valerie Parham-Thompson

In my previous post on routing, I used Dijkstra’s algorithm without much discussion of alternatives. The Dijkstra algorithm works for network routing, and for many problems it is the right choice. But pgRouting also ships with pgr_aStar, an implementation of the A* algorithm that can find the same shortest path while exploring fewer edges. The difference comes down to one thing: a heuristic that tells the algorithm which direction to look.

Routing APIs Compared: Choosing the Right Shortest-Path Service

Routing APIs Compared: Choosing the Right Shortest-Path Service

Valerie Parham-Thompson

In my previous post on pgRouting, I showed how to run shortest-path queries directly inside PostgreSQL. That approach works well when your road data is already in Postgres and your network is moderate-sized. But what happens when you need live traffic data, global coverage, or routing at thousands of queries per second? That is where external routing APIs and dedicated routing engines come in.

I have evaluated five routing services that cover the spectrum from fully managed APIs to self-hosted open-source engines. Each one makes different trade-offs around cost, speed, customization, and data freshness. Here is what I found.

Getting Started with pgRouting: Shortest Paths in PostgreSQL

Getting Started with pgRouting: Shortest Paths in PostgreSQL

Valerie Parham-Thompson

If your application needs to answer “what is the fastest route between two points,” you might reach for an external routing API like Mapbox Directions. But if your spatial data is already stored in PostgreSQL, the Postgres extension pgRouting lets you run graph-based routing queries right where the data is.

PostGIS gives you spatial data types, indexes, and operations like distance calculations, intersections, and buffers, but it has no concept of navigating a network from one point to another. pgRouting fills that gap by adding graph traversal and shortest-path algorithms on top of PostGIS. I recommend it for teams that already use PostGIS and want to keep routing logic close to the data.

Latency by Pekka Enberg

While many authors have written about database tuning, systems tuning, or code optimization, I haven't seen any come together to cover the whole stack in such a comprehensive way, targeting both software engineers and database architects.

Valerie Parham-Thompson

Having a shared vocabulary across database, software, and infrastructure teams is critical when working together to tune latency issues. I’ve been in many incident rooms where the only report is “the application is slow” and had to unwind a series of questions: What do you mean by slow? Where do you see this? What parts are slow? If everyone in the room had read Enberg’s Latency, solving these kinds of incidents would be much faster.

Using pg_stat_statements for Query profiling and performance tuning

pg_stat_statements is an extension that tracks execution statistics for every normalized SQL statement.

Valerie Parham-Thompson

Database performance problems are often mysterious. Queries slow down, CPU usage spikes, or users complain about latency, but pinpointing the cause requires visibility into what your database is actually doing. pg_stat_statements is PostgreSQL’s answer to this challenge.

pg_stat_statements is an extension that tracks execution statistics for every normalized (fingerprinted) SQL statement. Instead of logging millions of nearly-identical queries, it groups similar statements together (with constants replaced by placeholders), aggregating their execution metrics into a single fingerprint. This approach provides comprehensive query-level insights with minimal performance overhead and storage cost.

Query Optimization with HypoPG

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson

Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.

Understanding HypoPG

HypoPG is a PostgreSQL extension that allows you to create hypothetical indexes and see how they would affect your query plans without actually creating the indexes. This is particularly useful when:

Random Data Generation: Then and Now

Random Data Generation: Then and Now

Modern approaches to generating test data with Python Faker

Valerie Parham-Thompson

In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.

The Traditional SQL Approach

The traditional approach to generating test data relied heavily on SQL functions like RAND() and string manipulation. This method worked but had limitations:

College Scorecard Application

College Scorecard Application

Building an application to serve College Scorecard open data

Valerie Parham-Thompson

Still working on the College Scorecard dataset. Previously I explored the dataset in a real-world application, talked about how to clean the data, and worked with the data API.

Now I’ve decided I want to put this in a web application so others can use the dataset in the same flexible way that I have been using it. (Reminder that several popular college-search websites exist, but they are limited in the ways you can filter the data. Also they tend to gather personal data for I suppose ad generation.)

Finding the Right Yugabyte Api Endpoint

Finding the Right Yugabyte Api Endpoint

Tour through the YugabyteDB YBA API endpoints with a real-world example

Valerie Parham-Thompson

As YugabyteDB continues to evolve, its extensive API ecosystem offers powerful capabilities for database management and automation. However, with hundreds of API endpoints across overlapping categories, locating exactly the right API endpoint can be challenging. In this guide, I’ll walk you through several proven strategies for efficiently finding the API endpoints you need, along with real-world examples and pro tips I’ve learned from working with YugabyteDB’s API ecosystem.

Method 1: Navigating Categories in the API Documentation

The API documentation (api-docs.yugabyte.com) provides a well-organized categorical view of available endpoints. Understanding how to navigate these categories effectively will significantly speed up your API discovery process:

Handling Reserved Keywords in DSBulk for Seamless Data Migration

Handling Reserved Keywords in DSBulk for Seamless Data Migration

How to handle reserved keywords using Datastax DSBulk in YugabyteDB migration

Valerie Parham-Thompson

Migrating to YugabyteDB offers significant advantages in terms of high availability, global distribution, and horizontal scalability—features essential for managing modern database workloads. However, data migration can be a complex process, particularly when transforming your schema definition. Differences in datatype support, query syntax, and core features across systems can complicate the transformation.

One of the challenges is dealing with reserved keywords in the source schema that cannot be directly used in the target system. This can require changes not only in the database schema during transformation but also in application code and related tooling.