entropy-arbitrage/2020-04-05-database.md
2021-07-21 07:36:20 -04:00

5 KiB

layout title date tags summary thumbnail offset update
post Database Basics 2020-04-05 08:55:42-04:00
database
intro
education
preparation
Some background on databases /blog/assets/laptop-computer-writing-keyboard-brand-design-104014-pxhere.com.png -25% 2020-02-05-recutils.md

My original plan for this Sunday was an outline for people sheltering in place who would like a path towards changing careers. That's running slow, though, and part of that required understanding what a database is and how they work.

Laptop showing data

Conveniently, when I posted [my recutils rundown]({% post_url 2020-02-05-recutils %}) to DEV, I added some general information on databases, so that'll be this post and the career-hopping post will come next Sunday.

Not that you needed to know any of this, since I never announced my plans...

Background on Databases

Before I jump into recutils itself, if you're not already conversant in databases, I'll try to give a capsule version. For example, If you've taken the undergraduate database class or have written more than a few lines of SQL, you can probably safely skip down to Recutils, in General.

For the rest of you, here are some definitions, and then I'll go over some basics.

A database is a collection of related data, regardless of the form.

{% pull store data elements based on their relationships to other elements. %}

A relational database is a database---surprise!---where we store data elements based on their relationships (surprise again!) to other elements. Sometimes those relationships are described as records, and sometimes they're described by metadata that connects separate records. The full definition centers on Codd's Rules.

A database management system (DBMS) is software that mediates access to a database to minimize the chances of a user or program shooting itself in the foot. For example, it might (depending on the system) moderate concurrent access or prevent you from deleting a record that another record points to.

Every database has a universe of discourse or mini-world that explains how to interpret the data, whether anybody bothered to write it down. We all somehow agree to never talk about this, even though "what does this data actually mean?" is one of the most important questions we can be asking, so I'm bringing it up here...

Instead of a universe of discourse, the thing we really get is a schema, the layout of what we store in each kind of record and how they relate to each other through foreign keys. I'll explain foreign keys in a bit, rather than try to assemble a coherent definition.

With that out of the way, record sets are called "relations" by theoreticians and implemented as tables, with records implemented as rows and individual elements as columns. The terms are mostly interchangeable, but you'll find table/row/column used by just about everybody in industry.

The last thing you probably need to know (unless you want to implement a DBMS) is that finding data ("querying") has what amounts to three steps.

{% pull "foreign key," a...pointer into that other table's space. %}

First, we join the relations/tables we need, by which I mean taking records/rows that are connected through a common element/field/column. That commonality is (and should be specified as) a "foreign key," a kind of pointer into that other table's space. Even if we only care about one table, we still need to specify it.

Second, we filter the available records down to those whose elements are interesting to us, similar to writing an if statement in general programming. We might want all the records, of course, which makes this step trivial. Or this might be extremely complicated, depending on what we want.

Third, we project the remaining records to just the fields that interest us. Again, we might want every field/column, making this part easier to specify.

If you're familiar with SQL, you'll probably recognize the SELECT column-or-columns FROM table-or-tables WHERE conditions-are-true format, where the FROM might list tables arbitrarily or might explicitly use JOIN to connect the tables. If you're not, recutils doesn't use SQL, so don't worry about it.

In a full database course, from here, you'd talk more about those query languages like SQL and then look at how to implement a database management system. But for purposes of reading more about databases and tools that use databases, this should give you a decent footing.


Credits: The header image is untitled by an anonymous PxHere photographer, made available under the CC0 1.0 Universal Public Domain Dedication.