Archive for the ‘tricks’ Category

Mysql-Understand your engines

Thursday, July 24th, 2008

MySQL has two primary storange engines: MyISAM and InnoDB. Each has its own performance characteristics and considerations. In the broadest sense MyISAM is good for read-heavy data and InnoDB is good for write-heavy data, though there are cases where the opposite is true. The biggest gotcha is how the two differ with respect to the COUNT function.

MyISAM keeps an internal cache of table meta-data like the number of rows. This means that, generally, COUNT(*) incurs no additional cost for a well-structured query. InnoDB, however, has no such cache. For a concrete example, let’s say we’re trying to paginate a query. If you have a query SELECT * FROM users LIMIT 5,10, let’s say, running SELECT COUNT(*) FROM users LIMIT 5,10 is essentially free with MyISAM but takes the same amount of time as the first query with InnoDB. MySQL has a SQL_CALC_FOUND_ROWS option which tells InnoDB to calculate the number of rows as it runs the query, which can then be retreived by executing SELECT FOUND_ROWS(). This is very MySQL-specific, but can be necessary in certain situations, particularly if you use InnoDB for its other features (e.g., row-level locking, stored procedures, etc.).

SQL is Not C

Thursday, July 24th, 2008

C is the canonical procedural programming language and the greatest pitfall for a programmer looking to show off his database-fu is that he fails to realize that SQL is not procedural (nor is it functional or object-oriented, for that matter). Rather than thinking in terms of data and operations on data one must think of sets of data and relationships among those sets. This usually crops up with the improper use of a subquery:
SELECT a.id,
(SELECT MAX(created)
FROM posts
WHERE author_id = a.id)
AS latest_post
FROM authors a

Since this subquery is correlated, i.e., references a table in the outer query, one should convert the subquery to a join.
SELECT a.id, MAX(p.created) AS latest_post
FROM authors a
INNER JOIN posts p
ON (a.id = p.author_id)
GROUP BY a.id

Mysql-Learn Your Indices

Thursday, July 24th, 2008

Often your choice of indices will make or break your database. For those who haven’t progressed this far in their database studies, an index is a sort of hash. If we issue the query SELECT * FROM users WHERE last_name = ‘Goldstein’ and last_name has no index then your DBMS must scan every row of the table and compare it to the string ‘Goldstein.’ An index is usually a B-tree (though there are other options) which speeds up this comparison considerably.

You should probably create indices for any field on which you are selecting, grouping, ordering, or joining. Obviously each index requires space proportional to the number of rows in your table, so too many indices winds up taking more memory. You also incur a performance hit on write operations, since every write now requires that the corresponding index be updated. There is a balance point which you can uncover by profiling your code. This varies from system to system and implementation to implementation.

Mysql-Don’t Overuse Artificial Primary Keys

Thursday, July 24th, 2008

Artificial primary keys are nice because they can make the schema less volatile. If we stored geography information in the US based on zip code, say, and the zip code system suddenly changed we’d be in a bit of trouble. On the other hand, many times there are perfectly fine natural keys. One example would be a join table for many-to-many relationships. What not to do:
CREATE TABLE posts_tags (
relation_id int UNSIGNED NOT NULL AUTO_INCREMENT,
post_id int UNSIGNED NOT NULL,
tag_id int UNSIGNED NOT NULL,
PRIMARY KEY(relation_id),
UNIQUE INDEX(post_id, tag_id)
);

Not only is the artificial key entirely redundant given the column constraints, but the number of post-tag relations are now limited by the system-size of an integer. Instead one should do:
CREATE TABLE posts_tags (
post_id int UNSIGNED NOT NULL,
tag_id int UNSIGNED NOT NULL,
PRIMARY KEY(post_id, tag_id)
);

Mysql-Partition Your Tables

Thursday, July 24th, 2008

Often you have a table in which only a few columns are accessed frequently. On a blog, for example, one might display entry titles in many places (e.g., a list of recent posts) but only ever display teasers or the full post bodies once on a given page. Horizontal vertical partitioning helps:
CREATE TABLE posts (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
author_id int UNSIGNED NOT NULL,
title varchar(128),
created timestamp NOT NULL,
PRIMARY KEY(id)
);

CREATE TABLE posts_data (
post_id int UNSIGNED NOT NULL,
teaser text,
body text,
PRIMARY KEY(post_id)
);

The above represents a situation where one is optimizing for reading. Frequently accessed data is kept in one table while infrequently accessed data is kept in another. Since the data is now partitioned the infrequently access data takes up less memory. You can also optimize for writing: frequently changed data can be kept in one table, while infrequently changed data can be kept in another. This allows more efficient caching since MySQL no longer needs to expire the cache for data which probably hasn’t changed.

Mysql-Creating a Derived Data Column

Thursday, July 24th, 2008

Frequently we query our database for a value but then have to always do the same manipulation on it before we can use it. Well it can often be useful to do this “once” in the database and then store the manipulated value in a separate column of the database.

Here is a very simple example
INSERT INTO tbl_name (col1,col2) VALUES(15,col1*3.14159);

Where col2 is set to the value of col1*pi

You could also do more complex calculation/filtering/shortening in your code eg php,ASP etc

Mysql-Querying By Date

Thursday, July 24th, 2008

We are so frequently required to create date based queries, eg how many orders last week, last month, how many orders between two dates. It is necessary to avoid the trap of treating date as a number, instead convert the date into a handy unit such as days

# select all records from today
select * from tbl_orders where TO_DAYS(NOW()) = TO_DAYS(dtmdevdate);

# select all records from last three days
select * from tbl_orders where (TO_DAYS(NOW())-3) > TO_DAYS(orderdate);

MySQL provides a whole range of date functions become familiar with them here

Mysql-Test your MySQL queries inside a MySQL GUI before coding them

Thursday, July 24th, 2008

Before you start coding your application with PHP or whatever remember to test out your MySQL in your MySQL GUI such as SQLYOG or even the CLI

What are the advantages?

* You don’t have to worry about the Script/MySQL syntax clashes (quoting etc)
* You only have to think MySQL and not say PHP as well
* You can see the data produced or not produced
* You can test your concepts

Mysql-Synchronize data-types

Thursday, July 24th, 2008

Columns with identical information in different tables should be declared to have identical data types so that joins based on the corresponding columns will be faster.

Mysql-Use multiple-row inserts

Thursday, July 24th, 2008

Use multiple-row INSERT statements to store many rows with one SQL statement.