Ansible

Processing Data with Pandas

Valerie Parham-Thompson

I’ve been experimenting with processing data with Pandas this week, specifically historical NOAA weather data, and storing it in a local YugabyteDB cluster. This open data set contains max/min/precipitation for years back to 1750 (not all data points are available for all years or locations). It’s available here: https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html

I leveraged my existing demo framework to provision a local YugabyteDB cluster, and then used Pandas to import data from txt and csv files. The txt lookup files were countries, states, stations, and inventory. The csv files were available in different formats. The code I’ve linked below imports all weather data for a single year.

Foreign Data Wrappers

Valerie Parham-Thompson

I was recently setting up a demo to show off query logging features. Two common extensions, pg_stat_statements and pg_stat_monitor, store data locally. In the case of a distributed database, it is helpful to combine the query runtimes on all nodes.

YugabyteDB supports foreign data wrappers, so I decided to use this feature to combine query statistics from each of my three test nodes.

The libraries for the pg_stat_monitor extension are already installed, so the extension just needs to be created:

Provision Ansible Postgres on Mac

Valerie Parham-Thompson

I added a new database to my demo platform: Postgres. This code helps me provision Ansible Postgres on Mac for demo purposes or simple functional testing, and it is an extension of previous work I shared: https://valerieparhamthompson.com/posts/string-search/.

The script does a postgres install via Homebrew for Mac M1 and starts it up, then creates the database, user, etc. needed for the demo. Finally, it populates using my “million table” sql.

Most of this uses the Community.Postgres Ansible module found here: https://docs.ansible.com/ansible/latest/collections/community/postgresql/index.html