SQL general

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the databases category.

Last Updated: 2025-06-30

How to clear a DB

The CASCADE is usually necessary to handle foreign relations
The RESTART IDENTITY resets sequences

TRUNCATE "users", "animals" RESTART IDENTITY CASCADE;

TRUNCATE acquires an ACCESS EXCLUSIVE lock on each table it operates on, which blocks all other concurrent operations on the table. If concurrent access to a table is required, then the DELETE command should be used instead.

Materialized views

Like regular views, but persist the results in a table-like form.

CREATE MATERIALIZED VIEW mymatview AS SELECT age FROM users;

This can get out of date, so you should only use it when some out-of-dateness is tolerable

Refresh with the following command

REFRESH MATERIALIZED VIEW mymatview;

The materialized view cannot subsequently be directly updated

Bind Parameters

Bind parameters—also called dynamic parameters or bind variables—are an alternative way to pass data to the database. Instead of putting the values directly into the SQL statement, you just use a placeholder like ?, :name or @name and provide the actual values using a separate API call

dbh.prepare("select first_name, last_name"
          + "  from employees"
          + " where subsidiary_id = ?");
dbh.execute(subsidiary_id);

Why use these?

Bind variables are the best way to prevent SQL injection.
Databases with an execution plan cache like SQL Server and the Oracle database can reuse an execution plan when executing the same statement multiple times. It saves effort in rebuilding the execution plan but works only if the SQL statement is exactly the same. When using bind parameters you do not write the actual values but instead insert placeholders into the SQL statement. That way the statements do not change when executing them with different values. Naturally there are exceptions, for example if the affected data volume depends on the actual values - e.g. if the column is subsidiary_id and some subsidiaries have far more or far fewer rows than others, then you lose performance. When using bind parameters, the optimizer has no concrete values available to determine their frequency. It then just assumes an equal distribution. From this perspective, it is a little bit paradoxical that bind parameters can improve performance if not using bind parameters enables the optimizer to always opt for the best execution plan. But the question is at what price? Generating and evaluating all execution plan variants is a huge effort that does not pay off if you get the same result in the end anyway. NB: As the developer, you can use bind parameters deliberately to help resolve this dilemma. That is, you should always use bind parameters except for values that shall influence the execution plan. Unevenly distributed status codes like “todo” and “done” are a good example. The number of “done” entries often exceeds the “todo” records by an order of magnitude.

Effect of hardware on speed

Performance has two dimensions: response time and throughput.

Bigger hardware is not always faster—but it can usually handle more load. Bigger hardware is more like a wider highway than a faster car: you cannot drive faster—well, you are not allowed to—just because there are more lanes.

Scaling horizontally (adding more servers) has similar limitations. Although more servers can process more requests, they do not improve the response time for one particular query. To make searching faster, you need an efficient search tree.

The more complex the infrastructure gets, the more latencies accumulate and the slower the responses become. Some example latencies: DB on another machine across a network; firewall inserted between application server and DB; This effect often leads to the counterintuitive observation that the expensive production hardware is slower than the cheap desktop PC environment that was used for development.

SSD is a big improvement over spinning HDs

Full text search with LIKE

WHERE text_column LIKE '%XYZ%'

Autocommit

By default "autocommit" is enabled in databases. This means each individual SQL statement is treated as a transaction and automatically committed right after it is executed.

Sometimes your ORM etc. might switch autocommit off, usually to give more control over transactions and allow you to group two or more statements into a transaction. In this cases, you have to call something like connection.commit() to persist any changes to the DB.

Atomic

Atomicity is the defining property of database transactions. atomic allows us to create a block of code within which the atomicity on the database is guaranteed. If the block of code is successfully completed, the changes are committed to the database. If there is an exception, the changes are rolled back.

These can be nested. E.g. Django

opens a transaction when entering the outermost atomic block;
creates a savepoint when entering an inner atomic block;

Savepoints

A savepoint is a marker within a transaction that enables you to roll back part of a transaction, rather than the full transaction. Use it to mark a point in a transaction known to be in a good state.

Savepoints aren’t especially useful if you are using autocommit, the default behavior of Django. However, once you open a transaction with atomic(), you build up a series of database operations awaiting a commit or rollback. If you issue a rollback, the entire transaction is rolled back. Savepoints provide the ability to perform a fine-grained rollback, rather than the full rollback that would be performed by transaction.rollback().

Aggregates and Fan out

The following SQL can give the wrong result. Specifically, the count of visits will be more than all the visits in the original customer table used in the join.

SELECT    SUM(visits)
FROM      customer
LEFT JOIN order
ON        customer.customer_id = order.customer_id

Why? Because the join result has more rows that the original/primary table (fan out). Therefore there will be double counting.

When is fan-out a risk? - 1-to-Many - If one row of your primary table can match up with multiple rows in your joined table,

When is it not a risk? - 1:1 joins - If one row of your primary table only ever matches up with one row of your joined table - Many:1 If many rows of your primary table (the one you want to aggregate) match up with the same row in your joined table, you still count the rows in the primary table only once -- thereefore you are safe.

Semicolon & Sons