This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the databases category.
Last Updated: 2025-01-18
CASCADE
is usually necessary to handle foreign relationsRESTART IDENTITY
resets sequencesTRUNCATE "users", "animals" RESTART IDENTITY CASCADE;
TRUNCATE
acquires an ACCESS EXCLUSIVE
lock on each table it operates on, which blocks all other concurrent operations on the table. If concurrent access to a table is required, then the DELETE command should be used instead.
Like regular views, but persist the results in a table-like form.
CREATE MATERIALIZED VIEW mymatview AS SELECT age FROM users;
This can get out of date, so you should only use it when some out-of-dateness is tolerable
Refresh with the following command
REFRESH MATERIALIZED VIEW mymatview;
The materialized view cannot subsequently be directly updated
Bind parameters—also called dynamic parameters or bind variables—are an alternative way to pass data to the database. Instead of putting the values directly into the SQL statement, you just use a placeholder like ?, :name or @name and provide the actual values using a separate API call
dbh.prepare("select first_name, last_name"
+ " from employees"
+ " where subsidiary_id = ?");
dbh.execute(subsidiary_id);
Why use these?
subsidiary_id
and some subsidiaries have far more or far fewer rows than others, then you lose performance. When using bind parameters, the optimizer has no concrete values available to determine their frequency. It then just assumes an equal distribution. From this perspective, it is a little bit paradoxical that bind parameters can improve performance if not using bind parameters enables the optimizer to always opt for the best execution plan. But the question is at what price? Generating and evaluating all execution plan variants is a huge effort that does not pay off if you get the same result in the end anyway. NB: As the developer, you can use bind parameters deliberately to help resolve this dilemma. That is, you should always use bind parameters except for values that shall influence the execution plan. Unevenly distributed status codes like “todo” and “done” are a good example. The number of “done” entries often exceeds the “todo” records by an order of magnitude.Performance has two dimensions: response time and throughput.
Bigger hardware is not always faster—but it can usually handle more load. Bigger hardware is more like a wider highway than a faster car: you cannot drive faster—well, you are not allowed to—just because there are more lanes.
Scaling horizontally (adding more servers) has similar limitations. Although more servers can process more requests, they do not improve the response time for one particular query. To make searching faster, you need an efficient search tree.
The more complex the infrastructure gets, the more latencies accumulate and the slower the responses become. Some example latencies: DB on another machine across a network; firewall inserted between application server and DB; This effect often leads to the counterintuitive observation that the expensive production hardware is slower than the cheap desktop PC environment that was used for development.
SSD is a big improvement over spinning HDs
WHERE text_column LIKE '%XYZ%'
By default "autocommit" is enabled in databases. This means each individual SQL statement is treated as a transaction and automatically committed right after it is executed.
Sometimes your ORM etc. might switch autocommit off, usually to give more control over transactions and allow you to group two or more statements into a transaction. In this cases, you have to call something like connection.commit()
to persist any changes to the DB.
Atomicity is the defining property of database transactions. atomic allows us to create a block of code within which the atomicity on the database is guaranteed. If the block of code is successfully completed, the changes are committed to the database. If there is an exception, the changes are rolled back.
These can be nested. E.g. Django
A savepoint is a marker within a transaction that enables you to roll back part of a transaction, rather than the full transaction. Use it to mark a point in a transaction known to be in a good state.
Savepoints aren’t especially useful if you are using autocommit, the default behavior of Django. However, once you open a transaction with atomic(), you build up a series of database operations awaiting a commit or rollback. If you issue a rollback, the entire transaction is rolled back. Savepoints provide the ability to perform a fine-grained rollback, rather than the full rollback that would be performed by transaction.rollback().
The following SQL can give the wrong result. Specifically, the count of visits will be more than all the visits in the original customer
table used in the join.
SELECT SUM(visits)
FROM customer
LEFT JOIN order
ON customer.customer_id = order.customer_id
Why? Because the join result has more rows that the original/primary table (fan out). Therefore there will be double counting.
When is fan-out a risk? - 1-to-Many - If one row of your primary table can match up with multiple rows in your joined table,
When is it not a risk? - 1:1 joins - If one row of your primary table only ever matches up with one row of your joined table - Many:1 If many rows of your primary table (the one you want to aggregate) match up with the same row in your joined table, you still count the rows in the primary table only once -- thereefore you are safe.