SQL general

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the databases category.

Last Updated: 2024-11-23

How to clear a DB

TRUNCATE "users", "animals" RESTART IDENTITY CASCADE;

TRUNCATE acquires an ACCESS EXCLUSIVE lock on each table it operates on, which blocks all other concurrent operations on the table. If concurrent access to a table is required, then the DELETE command should be used instead.

Materialized views

Like regular views, but persist the results in a table-like form.

CREATE MATERIALIZED VIEW mymatview AS SELECT age FROM users;

This can get out of date, so you should only use it when some out-of-dateness is tolerable

Refresh with the following command

REFRESH MATERIALIZED VIEW mymatview;

The materialized view cannot subsequently be directly updated

Bind Parameters

Bind parameters—also called dynamic parameters or bind variables—are an alternative way to pass data to the database. Instead of putting the values directly into the SQL statement, you just use a placeholder like ?, :name or @name and provide the actual values using a separate API call

dbh.prepare("select first_name, last_name"
          + "  from employees"
          + " where subsidiary_id = ?");
dbh.execute(subsidiary_id);

Why use these?

Effect of hardware on speed

Performance has two dimensions: response time and throughput.

Bigger hardware is not always faster—but it can usually handle more load. Bigger hardware is more like a wider highway than a faster car: you cannot drive faster—well, you are not allowed to—just because there are more lanes.

Scaling horizontally (adding more servers) has similar limitations. Although more servers can process more requests, they do not improve the response time for one particular query. To make searching faster, you need an efficient search tree.

The more complex the infrastructure gets, the more latencies accumulate and the slower the responses become. Some example latencies: DB on another machine across a network; firewall inserted between application server and DB; This effect often leads to the counterintuitive observation that the expensive production hardware is slower than the cheap desktop PC environment that was used for development.

SSD is a big improvement over spinning HDs

Full text search with LIKE

WHERE text_column LIKE '%XYZ%'

Autocommit

By default "autocommit" is enabled in databases. This means each individual SQL statement is treated as a transaction and automatically committed right after it is executed.

Sometimes your ORM etc. might switch autocommit off, usually to give more control over transactions and allow you to group two or more statements into a transaction. In this cases, you have to call something like connection.commit() to persist any changes to the DB.

Atomic

Atomicity is the defining property of database transactions. atomic allows us to create a block of code within which the atomicity on the database is guaranteed. If the block of code is successfully completed, the changes are committed to the database. If there is an exception, the changes are rolled back.

These can be nested. E.g. Django

Savepoints

A savepoint is a marker within a transaction that enables you to roll back part of a transaction, rather than the full transaction. Use it to mark a point in a transaction known to be in a good state.

Savepoints aren’t especially useful if you are using autocommit, the default behavior of Django. However, once you open a transaction with atomic(), you build up a series of database operations awaiting a commit or rollback. If you issue a rollback, the entire transaction is rolled back. Savepoints provide the ability to perform a fine-grained rollback, rather than the full rollback that would be performed by transaction.rollback().

Aggregates and Fan out

The following SQL can give the wrong result. Specifically, the count of visits will be more than all the visits in the original customer table used in the join.

SELECT    SUM(visits)
FROM      customer
LEFT JOIN order
ON        customer.customer_id = order.customer_id

Why? Because the join result has more rows that the original/primary table (fan out). Therefore there will be double counting.

When is fan-out a risk? - 1-to-Many - If one row of your primary table can match up with multiple rows in your joined table,

When is it not a risk? - 1:1 joins - If one row of your primary table only ever matches up with one row of your joined table - Many:1 If many rows of your primary table (the one you want to aggregate) match up with the same row in your joined table, you still count the rows in the primary table only once -- thereefore you are safe.

Resources