<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MySQL Expert - Nilesh Pawar</title>
	<atom:link href="http://www.nileshpawar.com/mysqlexpert/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nileshpawar.com/mysqlexpert</link>
	<description>Providing Perfect Solutions For You</description>
	<lastBuildDate>Tue, 21 Jun 2011 13:16:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>MySQL IF() statement</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2011/06/mysql-if-statement/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2011/06/mysql-if-statement/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 13:16:47 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=82</guid>
		<description><![CDATA[MySQL IF() statement is an usual if else statement we used, just that the syntax is different, we shall conform the MySQL standard to write a MySQL if statement. Its used when we wanna check if condition in our sql statement. Example below: 1.$sql="SELECT * FROM users WHERE DATEDIFF(CURDATE(),IF('register_date'=NULL,CURDATE(),'register_date'))='1'"; MySQL IF() statement holds 3 expressions,]]></description>
			<content:encoded><![CDATA[<div>
<div>MySQL IF() statement is an usual if else statement we used, just that the syntax is different, we shall conform the MySQL standard to write a MySQL if statement. Its used when we wanna check if condition in our sql statement. Example below:</div>
<div></div>
<div><code>1.</code><code>$sql=</code><code>"SELECT * FROM users WHERE DATEDIFF(CURDATE(),IF('register_date'=NULL,CURDATE(),'register_date'))='1'"</code><code>;</code></div>
</div>
<p>MySQL IF() statement holds 3 expressions, IF(expr1, expr2, expr3). expr1 used to check condition, if expr1 TRUE, then return expr2, else return expr3. The example above interpreted as get all the users records who register yesterday, where difference between today and register date equals to 1 day. When it performs CURDATE() function, it will checks if register date is NULL, if its NULL then return CURDATE(), else return register date. So, for those register date equals to NULL, it will return today, CURDATE() and hence results in return empty result or force the sql statement not be executed, as DATEDIFF(CURDATE(),CURDATE())&lt;&gt;1.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2011/06/mysql-if-statement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Warehousing with MySQL and Infobright</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2011/06/data-warehousing-with-mysql-and-infobright/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2011/06/data-warehousing-with-mysql-and-infobright/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 03:53:27 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[Warehousing]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=80</guid>
		<description><![CDATA[Unless you’ve been on a desert island lately, you probably know that the area of data warehousing/analytics/business intelligence (BI) is going gangbusters these days. Not many years ago, when industry analyst groups polled CIO’s on their top priorities, BI was #10. Then it jumped to #2 in 2006, and today it’s #1 according to Gartner]]></description>
			<content:encoded><![CDATA[<p>Unless you’ve been on a desert island lately, you probably know that the area of data warehousing/analytics/business intelligence (BI) is going gangbusters these days. Not many years ago, when industry analyst groups polled CIO’s on their top priorities, BI was #10. Then it jumped to #2 in 2006, and today it’s #1 according to Gartner group. It’s no mystery as to why this is: it’s a cut-throat economy out there in all industries and smart businesses need to tap their internal data to make critical business decisions, both tactically and strategically, to stay ahead of the pack.</p>
<p>But here’s the rub: putting together a major BI landscape in a company can cost some serious cash. The better-bring-your-AMEX-goldcard attitude of a lot of data warehousing and BI vendors has caused some frustration on the part of many IT directors; in fact, a 2007 InformationWeek survey found that 39% of IT executives complained that software licensing costs prohibited them from rolling out the BI initiatives they want.</p>
<p>But enter open source! Just as open source has revolutionized many areas of software these days and opened the doors for nearly any company to craft a competitive IT environment of operating systems, development tools, and databases, it’s now doing the same thing in the area of data warehousing and BI. Witness the fact that data warehousing (as indicated on MySQL’s last major community and customer poll) is now the fifth most common use case for MySQL.</p>
<p>But here’s another statistic to digest: according to TDWI, the average growth rate of data warehouses ranges between 33 and 50% per year, and that’s conservative for some businesses. The most popular storage engine used for MySQL data warehousing today is MyISAM (second is InnoDB), and it does just fine performance-wise up to around 1TB of data. After that, a lot of users tend to partition their data warehouse across more than one server to help performance. Now, stats from the analyst group IDC say that the majority of data warehouses are 6TB and under (only 4% are over 25TB according to IDC), so that means most folks these days are looking to manage warehouses between hundreds of GB and 6TB.</p>
<p>If this is you, and you want to stay with a MySQL-based solution, then you owe it to yourself to check out the Infobright storage engine. Infobright, one of MySQL/Sun’s partners, supplies an engine that breaks through the limitations of MyISAM and other storage engines and delivers a very sophisticated piece of technology that (surprisingly…) doesn’t require heavy lifting when it comes to install, setup, and database design to get incredibly fast response times.</p>
<p>Let’s first take a look at the Infobright architecture to see how it accomplishes these things and then take it for a test drive to see how well it really performs on analytic-styled queries.</p>
<h2>Columns are Cool</h2>
<p>In a nutshell, the Infobright storage engine is a column-oriented architecture that combines a high-speed data loader, strong levels of data compression, and a smart, external optimizer and ‘knowledge grid’ to deliver impressive data warehousing capabilities. From a MySQL standpoint, Infobright presents just as any other storage engine would so there’s nothing really new to understand from an interface standpoint. Infobright is a separate install, however, from the general MySQL server, but it installs very simply (actually faster than a standard MySQL install I found), and is only a 17MB download, which is pretty remarkable for an engine that can scale up to managing ten’s of terabytes.</p>
<p>The first thing to note is that the Infobright engine is a column-oriented design. Column-oriented databases have been around for a while now (e.g. Sybase IQ), but are now coming on strong in popularity due to their strength in servicing data warehouse needs. In his March 2008 research report “What’s Cool About Columns”, Philip Howard of Bloor Research writes, “For much of the last decade the use of column-based approaches has been very much a niche activity. However … we believe that it is time for columns to step out of the shadows to become a major force in the data warehouse and associated markets.” He then adds, “Columns provide better performance at a lower cost with a smaller footprint: it is difficult to understand why any company seriously interested in query performance would not consider a column-based solution.”</p>
<p>Why would he make such statements? Because column-oriented designs like Infobright really do pack quite a punch in data warehousing situations. There are numerous reasons for this being the case, but here are just a few. Most data warehousing/analytic queries are only interested in a couple of columns in one or more tables rather than entire rows of columns. This being the case, storing data in the typical row-based format is inefficient for data warehouse purposes whereas storing data in a column-oriented fashion is much better. In a column-oriented design like Infobright, full table scans will never be performed (unless a query requests all or the majority of rows in a table); only full column scans may be. The end result is much less I/O and improved response times for column-oriented databases.</p>
<p>Infobright combines a column-oriented design with something else that everyone should like: high-levels of data compression. Because it’s column-oriented, when Infobright compresses data, it does so for each column, which is normally more efficient than standard row-based compression because the compression algorithms can be finely tuned for each column datatype. In normal row-based compression, you do well to see 2 or 3-to-1 compression; with Infobright, 10-to-1 compression is the norm (with some cases going much higher), so a 1TB database can be compressed down to 100GB. When some of the members of the Sun performance team saw this in action, they remarked to me how surprised they were at how well Infobright compressed data vs. many of the other databases (row-based and column-based) they had worked with. Of course, another by-product of compression is that not only does it help on the performance side, but it also helps on the storage cost side of the house, which should please most IT managers.</p>
<h2>Other Infobright Technical Benefits</h2>
<p>Although you can load data into a MySQL Infobright table via all the standard routes, Infobright also provides a special high-speed loader that really makes a difference in pumping data into a database. The loading algorithm is multi-threaded, which allows each column to be loaded in parallel. Single table loads with the Enterprise edition of Infobright run typically around 100GB an hour and multi-table loads can accomplish nearly 300GB an hour for binary data, which isn’t too shabby. The Community edition of Infobright has slower load speeds as it only handles text-based input (as opposed to binary), but you’ll still see loads around 40GB or so an hour.</p>
<p>But perhaps the major technical star in the Infobright architecture is its “knowledge grid” and accompanying optimizer. Infobright begins construction of the knowledge grid at the time data is loaded into the database (either initially or incrementally). The knowledge grid is essentially a statistical description of the data and demographics in the database. Here’s a key thing to remember about the knowledge grid: it serves as a substitute for indexes, which means you never create an index on an Infobright table. Never. And this is a good thing!</p>
<p>The Infobright knowledge grid does not have the maintenance disadvantages of indexes, which as we all know, cause insert and update response times to degrade over time as more and more modifications are made to a database. Another advantage is that Infobright does extremely well at servicing unpredictable queries, which are exactly what many data warehouses have to deal with. Such queries are a DBA’s nightmare because they can never design an efficient indexing or partitioning strategy, and performance is never that great. But with Infobright, such problems evaporate because it does all the work dynamically for you.</p>
<p>The actual data in Infobright is stored as columns as previously mentioned. The columns themselves are divided into groups of 64K values called data packs, whose metadata is stored in the knowledge grid. When a query is submitted to Infobright for execution, the optimizer consults the knowledge grid in order to generate a rough idea of which data packs contain data needed for the result set of the query. Queries run exceptionally fast when (a) there are relatively few data packs containing data in the result set, and (b) the optimizer is able to accurately identify the set of needed data packs.</p>
<p>One way to think of this data distribution is something akin to auto data partitioning. And again, this is a good thing. With Infobright, you don’t have to be a data warehouse design god to get an exceptionally performing data warehouse as it does nearly all the hard work for you – no indexing or partitioning strategies are necessary on your part.</p>
<p>In some cases a query can be executed without looking at any data packs at all; only the knowledge grid is consulted, and such queries will execute instantaneously. Since the knowledge grid contains many aggregate values, and because aggregates are a common aspect of queries in data warehousing applications, it is not unusual for many data warehousing type queries to execute with little or no required processing (examples follow below).</p>
<p>Application scenarios where Infobright tends to shine are star schemas (although true star design isn’t required) with deep, wide fact tables and shallow dimension tables, and designs where the tables in the databases are largely denormalized. Use cases where Infobright may not perform as well include applications with highly normalized schemas and more random data distributions. This is because such data doesn&#8217;t compress as well as data with clustered patterns and because the data in the query result sets are spread around the database so that large numbers of data packs need to be scanned.</p>
<p>So enough talk – let’s now exercise the Infobright engine across some standard data warehouse use cases and see how well it does. All the tests below were executed on a Dell PowerEdge 6850, with four Intel Xeon dual-core processors (3.4 GHz), 32GB of RAM, and five 300GB internal drives configured in RAID 10 style, running on 64-bit Red Hat Enterprise Linux 5 with Infobright Enterprise edition.</p>
<h2>Kicking the Tires</h2>
<p>Working with Infobright is pretty much the same as any other MySQL engine when it comes to creating tables – all you do is specify <code>brighthouse</code> as the engine type. For example:</p>
<pre>mysql&gt; create table t (c1 int) engine=brighthouse;
Query OK, 0 rows affected (0.02 sec)

mysql&gt; insert into t values (1), (2), (3);
Query OK, 3 rows affected (0.16 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql&gt; select * from t;
+------+
| c1   |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.00 sec)</pre>
<p>So now onto the actual tests: the schema I used for the queries below was a standard data warehousing star schema (representing a car sales database) that can be depicted in a data model as follows:</p>
<p><img src="http://dev.mysql.com/common/images/datawarehouse_mysql_infobright.jpg" alt="Data Warehousing with MySQL and Infobright" /></p>
<p>The number of rows and overall size of the database (as calculated from the <code>information_schema</code>) is as follows:</p>
<pre>+----------------------------+-------------+------------+--------------+
| table_name                 | engine      | table_rows | data_length  |
+----------------------------+-------------+------------+--------------+
| fact_sales5                | BRIGHTHOUSE | 8080000000 | 135581094511 |
| fact_sales                 | BRIGHTHOUSE | 1000000000 |  16789624027 |
| fact_sales1b               | BRIGHTHOUSE | 1000000000 |  17424704919 |
| mthly_sales_by_dealer_make | BRIGHTHOUSE |    4207788 |     43958330 |
| dim_vins                   | BRIGHTHOUSE |    2800013 |     15251819 |
| dim_sales_area             | BRIGHTHOUSE |      32765 |       302326 |
| dim_dates                  | BRIGHTHOUSE |       4017 |         9511 |
| dim_dealers                | BRIGHTHOUSE |       1000 |         9631 |
| dim_dealers2               | BRIGHTHOUSE |       1000 |        10222 |
| dim_cars                   | BRIGHTHOUSE |        400 |         4672 |
| dim_msa                    | BRIGHTHOUSE |        371 |         3527 |
| tt                         | BRIGHTHOUSE |          1 |          193 |
+----------------------------+-------------+------------+--------------+
12 rows in set (0.00 sec)

+------------------+
| sum(data_length) |
+------------------+
|     169854973688 |
+------------------+
1 row in set (0.01 sec)</pre>
<p>The above shows a couple of fairly big fact tables at 1 billion rows each, a larger historical fact table at a little over 8 billion rows, one medium-sized summary table (4 million rows), and a number of dimension tables that are fairly small in size (except the 2.8 million row dim_vins table). The total physical size of the database is almost 170GB, but the actual raw size of the data was 1TB, so you can see the Infobright compression in action and that it delivers as promised.</p>
<p>Checking a few row counts prove the above numbers:</p>
<pre>mysql&gt; select count(*) from fact_sales5;
+------------+
| count(*)   |
+------------+
| 8080000000 |
+------------+
1 row in set (0.00 sec)

mysql&gt; select count(*) from fact_sales;
+------------+
| count(*)   |
+------------+
| 1000000000 |
+------------+
1 row in set (0.00 sec)</pre>
<p>Notice that Infobright responds just like MyISAM in full COUNT(*) queries; the knowledge grid knows how many rows are in each table so you won’t waste time figure-tapping in waiting for a response from such queries.</p>
<p>Now, let’s issue a few analytic queries and see what we get. First, let’s do a simple join to find out how much the dealership made over a certain time period for one make of car:</p>
<pre>mysql&gt; select sum(dlr_trans_amt)
    -&gt;     from fact_sales a, dim_cars b
    -&gt;     where a.make_id = b.make_id and
    -&gt;     b.make_name = 'ACURA' and
    -&gt;     b.model_name = 'MDX' and
    -&gt;     trans_date between '2007-01-01' and '2007-01-31';
+--------------------+
| sum(dlr_trans_amt) |
+--------------------+
|        11264027726 |
+--------------------+
1 row in set (24.98 sec)</pre>
<p>Not too bad at all. But now let’s put the knowledge grid / data packs to the test and see how big a dent in our response time we get by adding eight times more data to the mix:</p>
<pre>mysql&gt; select sum(dlr_trans_amt)
    -&gt;     from fact_sales5 a, dim_cars b
    -&gt;     where a.make_id = b.make_id and
    -&gt;     b.make_name = 'ACURA' and
    -&gt;     b.model_name = 'MDX' and
    -&gt;     trans_date between '2007-01-01' and '2007-01-31';
+--------------------+
| sum(dlr_trans_amt) |
+--------------------+
|        11264027726 |
+--------------------+
1 row in set (27.20 sec)</pre>
<p>Nice! Infobright was able to – again – only examine the needed data packs and exclude all the other data it didn’t need to look for to satisfy our query, with no real practical impact to the overall response time (running the same query again actually came in under the first query that used the smaller fact table).</p>
<p>Let’s now try something that the general MySQL Server will struggle with in some cases – nested subqueries:</p>
<pre>mysql&gt; select avg(dlr_trans_amt)
    -&gt; from   fact_sales
    -&gt; where  trans_date between '2007-01-01' and '2007-12-31' and
    -&gt;        dlr_trans_type = 'SALE' and make_id =
    -&gt;  (select make_id
    -&gt;   from dim_cars
    -&gt;   where make_name = 'ASTON MARTIN' and
    -&gt;         model_name = 'DB7') and
    -&gt;         sales_area_id in
    -&gt;            (select sales_area_id
    -&gt;             from dim_sales_area
    -&gt;             where sales_state =
    -&gt;               (select dealer_state
    -&gt;                from dim_dealers
    -&gt;                where dealer_name like 'BHUTANI%'));
+--------------------+
| avg(dlr_trans_amt) |
+--------------------+
|    45531.444471505 |
+--------------------+
1 row in set (50.78 sec)</pre>
<p>Infobright plows through the data just fine. What about UNION statements – oftentimes these can cause response issues with MySQL. Let’s try both fact tables this time:</p>
<pre>mysql&gt; (select avg(dlr_trans_amt), avg(sales_commission), avg(sales_discount)
    -&gt; from fact_sales
    -&gt; where trans_date between '2007-01-01' and '2007-01-31')
    -&gt; union all
    -&gt; (select avg(dlr_trans_amt), avg(sales_commission), avg(sales_discount)
    -&gt; from fact_sales
    -&gt; where trans_date between '2007-02-01' and '2007-02-28');
+--------------------+-----------------------+---------------------+
| avg(dlr_trans_amt) | avg(sales_commission) | avg(sales_discount) |
+--------------------+-----------------------+---------------------+
|   45550.1568209903 |               5.39966 |     349.50289769532 |
|   45549.5774942714 |               5.39976 |    349.498835301098 |
+--------------------+-----------------------+---------------------+
2 rows in set (0.49 sec)

mysql&gt; (select avg(dlr_trans_amt), avg(sales_commission), avg(sales_discount)
    -&gt; from fact_sales5
    -&gt; where trans_date between '2007-01-01' and '2007-01-31')
    -&gt; union all
    -&gt; (select avg(dlr_trans_amt), avg(sales_commission), avg(sales_discount)
    -&gt; from fact_sales5
    -&gt; where trans_date between '2007-02-01' and '2007-02-28');
+--------------------+-----------------------+---------------------+
| avg(dlr_trans_amt) | avg(sales_commission) | avg(sales_discount) |
+--------------------+-----------------------+---------------------+
|   45550.1568209903 |               5.39966 |     349.50289769532 |
|   45549.5774942714 |               5.39976 |    349.498835301098 |
+--------------------+-----------------------+---------------------+
2 rows in set (0.75 sec)</pre>
<p>It appears the UNION’s were satisfied via knowledge grid access alone. Next, let’s try a few joins coupled with a having clause and ask for the average Ashton Martin dealer transaction amounts over one year for dealers in the state of Indiana:</p>
<pre>mysql&gt; select fact.dealer_id,
    -&gt;        avg(fact.dlr_trans_amt)
    -&gt; from   fact_sales fact
    -&gt; inner  join dim_cars cars on (fact.make_id = cars.make_id)
    -&gt; inner  join dim_sales_area sales on
    -&gt;        (fact.sales_area_id = sales.sales_area_id)
    -&gt; where  fact.trans_date between '2007-01-01' and '2007-12-31' and
    -&gt;        fact.dlr_trans_type = 'SALE' and
    -&gt;        cars.make_name = 'ASTON MARTIN' and
    -&gt;        cars.model_name = 'DB7' and
    -&gt;        sales.sales_state = 'IN'
    -&gt; group  by fact.dealer_id
    -&gt; having avg(fact.dlr_trans_amt) &gt; 50000
    -&gt; order  by fact.dealer_id desc;

.
.
.
|         2 |         51739.181818182 |
|         1 |                 57964.8 |
+-----------+-------------------------+
317 rows in set (50.66 sec)</pre>
<p>Of course, there are plenty of other queries that could be tested, but the above will give you a feel for how Infobright performs for some typical analytic-styled queries. And again, one of the great things is you don’t have to spend time designing indexing or partitioning schemes to get performance results like those show above because none of that is necessary in Infobright. In fact, there are only around three tuning parameters for the engine and they’re all memory related.</p>
<h2>Infobright Limitations</h2>
<p>There are certain limitations you need to be aware of right now with Infobright. Not all queries can be satisfied with the Infobright optimizer; those that can’t end up being sent over to the MySQL optimizer. If that happens you’ll receive a warning after your query executes that states that such a thing occurred.</p>
<ul>
<li>Infobright can handle up to 32 concurrent queries at this time.</li>
<li>A query is currently constrained to one CPU/Core.</li>
<li>Correlated subqueries are supported, but typically aren’t run that efficiently.</li>
<li>DML (<code>insert</code>, <code>update</code>, <code>delete</code>; only available in the Enterprise edition) only supports table-level locking, which could reduce concurrency if much DML occurs in an Infobright warehouse.</li>
<li>Theoretically, an Infobright table can go up to 147 trillion rows, but practically, 50 billion is the limit today – a count which greatly depends on the data row size and datatypes used (i.e. the row limit could go higher depending on the row size).</li>
<li>Internationalization support is lacking right now; UTF8 support planned first half of 2009.</li>
<li>Infobright supports Windows for ICE and Solaris for IEE only right now. They also have pre-built VMs available for other environments (based on VMWare.).</li>
<li>No <code>ALTER TABLE</code> support. You cannot switch from other tables to Infobright or vice versa.</li>
<li>The knowledge grid can’t process non-Infobright tables well so if you have a mixture of storage engines in a query that involves Infobright tables too, expect your performance to take a hit</li>
</ul>
<p>I also hit a small bug on two of the queries above where substituting the 8 billion row table for the one billion row table caused a performance hit on the query (everything else being equal). Apparently, there is a known bug on the sorting algorithm that relates to the source data and is only experienced with large tables. Infobright is in the process of correcting it.</p>
<p>Right now in terms of operating system and hardware support, the Infobright storage engine runs on 32-bit (Community edition only) and 64-bit (Community and Enterprise editions) Intel and AMD Red Hat Enterprise Linux, CentOS, Fedora (Community edition only) and Debian with standard commodity hardware. And all key BI tools (Business Objects, Cognos, Pentaho, Jaspersoft, etc.) support the MySQL and Infobright combination.</p>
<p>Reference:- <a href="http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html">http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2011/06/data-warehousing-with-mysql-and-infobright/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multiple Buffer Pools in MySQL 5.5</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2011/06/multiple-buffer-pools-in-mysql-5-5/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2011/06/multiple-buffer-pools-in-mysql-5-5/#comments</comments>
		<pubDate>Sat, 18 Jun 2011 06:18:27 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=76</guid>
		<description><![CDATA[In our work to improve MySQL scalability we tested many opportunities to scale the MySQL Server and the InnoDB storage engine. The InnoDB buffer pool was often one of the hottest contention points in the server. This is very natural since every access to a data page, UNDO page, index page uses the buffer pool]]></description>
			<content:encoded><![CDATA[<p>In our work to improve MySQL scalability we tested many opportunities to scale the MySQL Server and the InnoDB storage engine. The InnoDB buffer pool was often one of the hottest contention points in the server. This is very natural since every access to a data page, UNDO page, index page uses the buffer pool and even more so when those pages are read from disk and written to disk.</p>
<p>As usual there were two ways to split the buffer pool mutex, one way is to split it functionally into different mutexes protecting different parts. There have been experiments in particular on splitting out the buffer pool page hash table, the flush list. Other parts that have been broken out in experiments are the LRU list, the free list and other data structures internally in the buffer pool. Additionally it is as usual possible to split the buffer pool into multiple buffer pools. Interestingly one can also combine using multiple buffer pools with splitting the buffer pool mutex into smaller parts. The advantage of using multiple buffer pools is that it is very rare that it is necessary to grab multiple mutexes for the buffer pool operation which quickly becomes the case when splitting the buffer pool into multiple mutex protection areas.</p>
<p>After working on scalability improvements in MySQL and InnoDB I noted that all the discussion was around how to split the buffer pool mutex and no dicsussion centered around how to make multiple buffer pools out of the buffer pool. I decided to investigate how difficult it would be to make this change. I quickly realised that it needed a thorugh walk through of the code. It required a code check that required checking about 150 methods and their interaction. This sounds like a very big task, but fortunately the InnoDB code is well structured and have fairly simple dependencies between its methods. After this walk through of the buffer pool code one quickly found that there were 3 different ways of getting hold of the buffer pool, one method was to calculate it using the space id and page id. This is the normal method in most methods used in the external buffer pool interface. However there were numerous occasions where we only had access to the block or page data structure and it would be a bit useless to recalculate the hash value in every method that needed access to the buffer pool data structure. So it was decided to leave a reference to the buffer pool in every page data structure. There were also a few occasions where one needed to access all buffer pools.</p>
<p>The analysis proved that most of the accesses to the buffer pool was completely independent of other accesses to the buffer pool for other pages. InnoDB uses read-ahead and neighbour writes in the IO operations that are started from the buffer pool. These always operate on an extent of 64 pages. Thus it made sense to map the pages of 64 pages into one buffer pool to avoid having to operate on multiple buffer pools on every IO operation.</p>
<p>With these design ideas there were only a few occasions where it was necessary to operate on all buffer pools. One such operation was when the log required knowledge of the page with the oldest LSN of the buffer pool. Now this operation requires looping over all buffer pools and checking the minimum LSN of each buffer pool instance. This is a fairly rare operation so isn&#8217;t a scalability issue.</p>
<p>The other operation with requirement to loop over all pages needed a bit more care, this operation is the background operation flushing buffer pool pages to disk. A couple of problems needs consideration here. First it is necessary to flush pages regularly from all buffer pool instances, secondly it&#8217;s still important to flush neighbours. Given that many disks are fairly slow, it can be problematic to spread the load in this manner to many buffer pools. This is an important consideration when deciding how many buffer pool instances to configure.</p>
<p>The default number of buffer pool is one and for most small configurations with less than 8 cores it&#8217;s mostly a good idea not to increase this value. If you have an installation that uses 8 cores or more one should also pay attention to the disk subsystem that is used. Given that InnoDB often writes up to 64 neighbours in each operation and that the flushing should happen each second, it makes sense to have a disk subsystem capable of having 500 IO operations per second to use 8 buffer pool instances. This can be set in the innodb_io_capacity configuration variable. One SSD drive should be capable of handling this, two fast hard drives or 3 slow ones.</p>
<p>In our experiments we have mostly used 8 buffer pools, more buffer pools can be useful at times. The main problem with many buffer pools is related to the IO operations. It is important to have a balanced IO load in the MySQL server.</p>
<p>Our analysis of using multiple buffer pool instances have shown some interesting facts. First the accesses to the buffer pools is in no way evenly spread out. This is not surprising given that e.g. the root page of an index is a very frequently accessed page. So using sysbench with only one table, there will obviously be much more accesses to certain buffer pool instances. Our experiments shows that in sysbench using 8 buffer pools, the hottest buffer pool receives about one third of all accesses. Given that sysbench is a worst case scenario for the multiple buffer pool case, this means that most applications that tend to use more tables and more indexes should have a much more even load on the buffer pools.</p>
<p>So how much does multiple buffer pools improve the scalability of the MySQL Server. The answer is as usual dependent on application, OS, HW and so forth. But some general ideas can be found from our experiments. In sysbench using a load which is entirely held in main memory, so the disk is only used for flushing data pages and logging, in this system the multiple buffer pools can provide up to 10% improvement of the throughput in the system. In dbStress, the benchmark <a href="http://dimitrik.free.fr/">Dimitri</a> uses, we have seen all the way up to 30% improvement. The reason here is most likely that dbStress uses more tables and have avoided many other bottlenecks in the MySQL Server and thus the buffer pool was a worse bottleneck in dbStress compared to sysbench. From the code it is also easy to see that the more IO operations the buffer pool performs, the more the buffer pool mutex will be acquired and also often held for a longer time. One such example is the search for a free page on the LRU list every time a read is performed into the buffer pool from the disk.</p>
<p>Furthermore the use of multiple buffer pool opens up for many more improvements and also it doesn&#8217;t remove the possibility to split the buffer pool mutex even more.</p>
<p>Another manner of displaying the importance of using multiple buffer pools is the mutex statistics on the buffer pool mutex. With one buffer pool the buffer pool had about 750k accesses per second in a sysbench test where the MySQL Server had access to 16 cores. 50% of those accesses met a mutex already held, so it&#8217;s obvious that the InnoDB mutex subsystem is very well aligned with the buffer pool mutex which have very short duration which makes spinning waiting for it very fruitful. Anyways a mutex which is held 50% of the time makes the buffer pool mutex a limiting factor of the MySQL Server. Quite a few threads will often spend time in the queue waiting for the buffer pool mutex. So splitting the buffer pool into 8 instances even in sysbench means that the hottest buffer pool receives about one third of the 750k accesses so should be held about 17% of the time. Our later experiments shows that the hottest buffer pool mutexes are now held up to about 14-15% of the time. So the theory matches the real world fairly well. This means that the buffer pool is still a major factor in the MySQL Scalability equation but is now more on par with the other bottlenecks in the MySQL Server.</p>
<p>The development project of multiple buffer pools happened at a time when the MySQL and InnoDB teams could start working together. I was impressed by the willingness to cooperate and the competence in the InnoDB team that made it possible to introduce multiple buffer pools into MySQL 5.5. Our cooperation has continued since then and this has led to improvements in productivity on both parts. So for you as a MySQL user this spells good times going forward.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2011/06/multiple-buffer-pools-in-mysql-5-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ubuntu Upstart for automatic MySQL start and stop</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2011/06/ubuntu-upstart-for-automatic-mysql-start-and-stop/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2011/06/ubuntu-upstart-for-automatic-mysql-start-and-stop/#comments</comments>
		<pubDate>Sat, 18 Jun 2011 06:14:47 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=73</guid>
		<description><![CDATA[Here at Recorded Future we use Ubuntu (running on Amazon EC2), but so far we have not explored Ubuntu Upstart that much. During the holidays I made an effort to get acquainted with Upstart and to implement proper MySQL start and stop with it. If you do not know Upstart, this is the way you start]]></description>
			<content:encoded><![CDATA[<p>Here at Recorded Future we use Ubuntu (running on Amazon EC2), but so far we have not explored Ubuntu Upstart that much. During the holidays I made an effort to get acquainted with Upstart and to implement proper MySQL start and stop with it.</p>
<p>If you do not know Upstart, this is the way you start and stop services in Ubuntu, and it serves the same purpose as the old /etc/init.dscripts, but are a bit more structured and powerful. That said, Upstart is regrettably far from complete, although the functionality is much better and Upstart has some cool features, some things do not work that well. For one thing, documentation, where it exists, is useless, at best. Secondly, there is very limited ability to test and develop Upstart scripts. And this is made worse by the fact that the documentation is so bad. Another thing is that Upstart insist on stopping services, by default, by sending a brutal kill signal. Not good for databases, mostly.</p>
<p>In the /etc/init directory are the Upstart scripts you have. In difference to the old init.d scripts, you cannot disable a service in Upstart curenntly. If it is in /etc/init it will be started at system start. That&#8217;s it. And this is something that I am sure will be fixed, but for now, again, is something we have to live with. Upstart scripts have the suffix .conf (don&#8217;t ask me why), so the default MySQL Upstart script, for example, is called /etc/init/mysql.conf.</p>
<p>In an Upstart script, there are Stanzas that determine what to do. Like the exec Stanza that runs a program for example. And you may then ask, when is it run? Startup? Shutdown? And the answer is startup. For shutting things down, as I said before, Upstart will by default just send a kill -9 signal.</p>
<p>The minimal startup script you can have, and this actually works in a reasonable way, is to just have one line with an exec stanza, like this:<br />
exec /usr/bin/mydaemon<br />
Which will start the daemon. For stopping the daemon, Upstart will send a -9 signal to the started process by default, and nothing more is needed in the Upstart script.</p>
<p>For MySQL, we need to make things a bit more complicated. The default mysql.conf Upstart script really is not good. For one thing, it will not do a controlled shutdown of MySQL (this is possible even if Upstart will eventually send a kill -9 anyway). Secondly, this script assumes that what we use is a standard Ubunty installed MySQL distribution, so if you have installed MySQL in /usr/bin/mysql5147 or somethings like that, you are out of luck.</p>
<p>So what I wanted to create was an Upstart script for MySQL that fullfilled these requirements:</p>
<ol>
<li>Starts MySQL automatically.</li>
<li>Waits for MySQL to be available before exiting.</li>
<li>Be configurable to support different MySQL install locations, data directories etc.</li>
<li>Do a clean shutdown of MySQL when stopping the MySQL services.</li>
</ol>
<p>Before I show you what I ended up with, I want to comment on the points 2 and 4 above. With Upstart, you can define a script or command to run just before or after a services has been started or stopped, and this is what I use to wait for MySQL to become available, and to send a SIGTERM to the MySQL Server when stopping (which will do a clean MySQL shutdown).</p>
<p>So here we go, a complete MySQL Upstart script, the way I want it to work:</p>
<pre>
#
# MySQL Service for Recorded Future
#
description     "MySQL Server"
author          "Anders Karlsson, Recorded Future"

start on (net-device-up
          and local-filesystems
          and runlevel [2345])
stop on runlevel [016]

expect fork
kill timeout 2

# Set variables.
env MYSQL_ETC=/etc/mysql
env MYSQL_PIDFILE=/var/run/mysql.pid
env MYSQL_HOME=/usr/local/mysql5.5
env MYSQL_INSTANCE=my
umask 007

exec $MYSQL_HOME/bin/mysqld_safe --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf &gt;&gt; /tmp/x.out &amp;

post-start script
    loop=600
# Wait for MySQL to start.
    while [ $loop -gt 0 ]; do
        if $MYSQL_HOME/bin/mysqladmin --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf ping; then
            break
        fi
        loop=$(($loop - 1))
        sleep 1
    done
    exit 0
end script

# Send a soft SIGTERM to MySQL before Upstart will kill it.
# A Sigterm to mysqld will cause a controlled shutdown.
pre-stop script
    exec kill -SIGTERM `cat $MYSQL_PIDFILE`

# Wait for MySQL to end. Flushing buffers and all.
    loop=600
    while [ $loop -gt 0 ]; do
# If the pidfile is found, then continue waiting.
        if [ -e $MYSQL_PIDFILE ] ; then
            loop=$((loop - 1))
            sleep 1
            continue
        fi
        break
    done
end script</pre>
<p>To be honest, this is not what I create for all our MySQL servers. Instead I used this to create a chef template, chef is what we use for configuration management here (see http://www.opscode.com/ for more on chef), and here it is put to good ude to generate an Upstart script for MySQL. The above is just an example.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2011/06/ubuntu-upstart-for-automatic-mysql-start-and-stop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Optimizing Queries on Cluster</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2011/06/optimizing-queries-on-cluster/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2011/06/optimizing-queries-on-cluster/#comments</comments>
		<pubDate>Sat, 18 Jun 2011 06:09:19 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=71</guid>
		<description><![CDATA[On the cluster mailing list (cluster@lists.mysql.com , thread &#8220;slow selects&#8221;) there was recently a very good example of how to optimize queries on Cluster. Thanks to Nick Keefen for raising this problem and Jeff Sturm for the answer how to solve it! In short the problem is that the Optimizer in the MySQL server does]]></description>
			<content:encoded><![CDATA[<p>On the cluster mailing list (cluster@lists.mysql.com , thread &#8220;slow selects&#8221;) there was recently a very good example of how to optimize queries on Cluster. Thanks to Nick Keefen for raising this problem and Jeff Sturm for the answer how to solve it!</p>
<p>In short the problem is that the Optimizer in the MySQL server does not get adequate statistics from the data nodes about table sizes, indexes etc. This makes the Optimizer clueless in some cases how to order tables in joins, and also in some cases which is the best indexes to use.</p>
<p>So here is the problem that Nick highlighted:</p>
<p>When the tables are stored in MyISAM:<br />
mysql> SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) limit 3;<br />
3 rows in set (0.32 sec) </p>
<p>phpfox_user is about 100000 rows and phpfox_videos is 170000 rows large</p>
<p>Trying to run the same query on my cluster machine, i get<br />
3 rows in set (20.47 sec) </p>
<p>Why is this? Let&#8217;s look at the EXPLAIN of the queries:</p>
<p>MyISAM explain:</p>
<p>mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) ORDER BY v.vid_time DESC LIMIT 3;<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;<br />
+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;<br />
+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
| 1 | SIMPLE | v | ALL | vid_userid | NULL | NULL | NULL | 135025 | Using filesort |<br />
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | nktest.v.vid_userid | 1 | |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;<br />
+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;-+</p>
<p>NDB explain:</p>
<p>mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) ORDER BY v.vid_time DESC LIMIT 3;<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8211;<br />
-+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
&#8212;&#8212;&#8212;+<br />
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8211;<br />
-+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
&#8212;&#8212;&#8212;+<br />
| 1 | SIMPLE | u | ALL | PRIMARY | NULL | NULL | NULL | 82124 | Using temporary; Using filesort |<br />
| 1 | SIMPLE | v | ref | vid_userid | vid_userid | 4 | bb2_phpfox.u.id | 1 | |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8211;<br />
-+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
&#8212;&#8212;&#8212;+ </p>
<p>Note the additional &#8216;Using temporary&#8217; clause for NDB. Nearly all of the 20 seconds taken by the query fall at &#8216;copying to temporary table&#8217;.</p>
<p>The problem is that the Optimizer has changed the order (in case of NDB) how tables should be joined!</p>
<p>Jeff Sturm replied to Nick with a remedy:</p>
<p>Looks like the query optimizer is putting the tables in the wrong order.<br />
(This isn&#8217;t uncommon in my experience, as NDB seems to have less<br />
information than other engines to optimize queries at runtime.)</p>
<p>Try modifying your query with STRAIGHT_JOIN syntax, i.e.:</p>
<p>SELECT v.vid_id, v.vid_title, u.user FROM phpfox_videos AS v<br />
STRAIGHT_JOIN phpfox_user AS u ON v.vid_userid = u.id ORDER BY<br />
v.vid_time DESC LIMIT 3;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2011/06/optimizing-queries-on-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Core Features for Data Warehousing</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2011/06/core-features-for-data-warehousing/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2011/06/core-features-for-data-warehousing/#comments</comments>
		<pubDate>Sat, 18 Jun 2011 06:05:07 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[Warehousing]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=68</guid>
		<description><![CDATA[In addition to MySQL&#8217;s various storage engines, the MySQL database server contains a number of core features that enable data warehousing. These include: Data/Index partitioning (range, hash, key, list, composite) in MySQL 5.1 and above No practical storage limits (1 tablespace=110TB) with automatic storage management Built-in Replication Strong indexing support (B-tree, fulltext, clustered, hash, GIS)]]></description>
			<content:encoded><![CDATA[<p>In addition to MySQL&#8217;s various storage engines, the MySQL database server contains a number of core features that enable data warehousing. These include:</p>
<p>Data/Index partitioning (range, hash, key, list, composite) in MySQL 5.1 and above<br />
No practical storage limits (1 tablespace=110TB) with automatic storage management<br />
Built-in Replication<br />
Strong indexing support (B-tree, fulltext, clustered, hash, GIS)<br />
Multiple, configurable data/index caches<br />
Pre-loading of data into caches<br />
Unique query cache (caches result set + query; not just data)<br />
Parallel data load<br />
Multi-insert DML<br />
Read-only tables<br />
Cost-based optimizer<br />
Wide platform support</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2011/06/core-features-for-data-warehousing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL in Data Warehousing &amp; Business Intelligence</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2011/06/mysql-in-data-warehousing-business-intelligence/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2011/06/mysql-in-data-warehousing-business-intelligence/#comments</comments>
		<pubDate>Sat, 18 Jun 2011 06:04:22 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[Warehousing]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=66</guid>
		<description><![CDATA[According to Forrester Research, the amount of data businesses retain for analytic purposes is growing at a rate of 50% per year, and in some industries such as Web, ecommerce, retail, telecommunications and government, the growth rate is even higher. Just a few years ago, the data used for business intelligence purposes was stored in]]></description>
			<content:encoded><![CDATA[<p>According to Forrester Research, the amount of data businesses retain for analytic purposes is growing at a rate of 50% per year, and in some industries such as Web, ecommerce, retail, telecommunications and government, the growth rate is even higher. Just a few years ago, the data used for business intelligence purposes was stored in a centralized data warehouse and a few departmental data marts. But now, the skyrocketing demand for better business intelligence data has created a vast array of distributed data repositories that run throughout organizations, which has resulted in increased complexity and costs for businesses wishing to maximize their use of analytic data.</p>
<p>To mitigate these issues, leading modern businesses such as Los Alamos National Labs, MIT Lincoln Lab, Cox Communications, and others have selected MySQL to power their growing data warehouse infrastructure. The growth of MySQL in the area of data warehousing recently prompted Gartner Group to include MySQL in their 2006 Magic Quadrant for Data Warehouse DBMS Servers.</p>
<p>MySQL is uniquely designed to easily handle the most common data warehousing use cases:</p>
<p>Data Marts<br />
Traditional Data Warehouses<br />
Large Historical/Archive Data Warehouses<br />
Real Time Data Warehouses<br />
MySQL offers other storage engines that can also be used for data warehousing as well. MySQL supports these key data warehousing features:</p>
<p>Data/Index partitioning (range, hash, key, list, composite) in MySQL 5.1 and above<br />
No practical storage limits with automatic storage management<br />
Built-in Replication<br />
Strong indexing support (B-tree, fulltext, clustered, hash, GIS)<br />
Multiple, configurable data/index caches<br />
Pre-loading of data into caches<br />
Unique query cache (caches result set + query; not just data)<br />
Parallel data load<br />
Multi-insert DML<br />
Read-only tables<br />
Cost-based optimizer<br />
Wide platform support<br />
Native Storage Engines<br />
MySQL currently offers a number of its own native Storage Engines, including:</p>
<p>InnoDB<br />
MyISAM<br />
Cluster<br />
Federated<br />
Archive<br />
Merge<br />
Memory<br />
CSV<br />
Blackhole</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2011/06/mysql-in-data-warehousing-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Features added to MySQL 5.5</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2010/10/features-added-to-mysql-5-5/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2010/10/features-added-to-mysql-5-5/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 09:24:43 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=63</guid>
		<description><![CDATA[Improved scalability on multi-core CPUs. The trend in hardware development now is toward more cores rather than continued increases in CPU clock speeds, which renders “wait until CPUs get faster” a nonviable means of improving database performance. Instead, it is necessary to make better use of multiple cores to maximally exploit the processing cycles they]]></description>
			<content:encoded><![CDATA[<p>Improved scalability on multi-core CPUs. The trend in hardware development now is toward more cores rather than continued increases in CPU clock speeds, which renders “wait until CPUs get faster” a nonviable means of improving database performance. Instead, it is necessary to make better use of multiple cores to maximally exploit the processing cycles they make available. MySQL 5.5 takes advantage of features of SMP systems and tries to eliminate bottlenecks in MySQL architecture that hinder full use of multiple cores. The focus has been on InnoDB, especially locking and memory management.</p>
<p>InnoDB I/O subsystem changes enable more effective use of available I/O capacity.</p>
<p>Several modifications improve operation of MySQL Server on Solaris.</p>
<p>There is better access to execution and performance information. Diagnostic improvements include DTrace probes, expanded SHOW ENGINE INNODB STATUS output, and a new status variable.</p>
<p>Support for an interface for semisynchronous replication: A commit performed on the master side blocks before returning to the session that performed the transaction until at least one slave acknowledges that it has received and logged the events for the transaction. Semisynchronous replication is implemented through an optional plugin component. See Section 16.3.8, “Semisynchronous Replication”</p>
<p>Support for the SQL standard SIGNAL and RESIGNAL statements. See Section 12.7.8, “SIGNAL and RESIGNAL”.</p>
<p>Support for Performance Schema, a feature for monitoring MySQL Server execution at a low level. See Chapter 20, MySQL Performance Schema.</p>
<p>Support for additional Unicode character sets: utf16, utf32, and utf8mb4. These character sets support supplementary Unicode characters; that is, characters outside the Basic Multilingual Plane (BMP). See Section 9.1.10, “Unicode Support”.</p>
<p>Enhancements to XML functionality, including a new LOAD XML statement.</p>
<p>Two new types of user-defined partitioning are supported: RANGE COLUMNS partitioning is an extension to RANGE partitioning; LIST COLUMNS partitioning is an extension to LIST partitioning. Each of these extensions provides two enhancements to MySQL partitioning capabilities:</p>
<p>It is possible to define partitioning ranges or lists based on DATE, DATETIME, or string values (such as CHAR or VARCHAR).</p>
<p>You can also define ranges or lists based on multiple column values when partitioning tables by RANGE COLUMNS or LIST COLUMNS, respectively. Such a range or list may refer to up to 16 columns.</p>
<p>For tables defined using these partitioning types, partition pruning can now optimize queries with WHERE conditions that use multiple comparisons between (different) column values and constants, such as a = 10 AND b > 5 or a < &#8220;2005-11-25&#8243; AND b = 10 AND c = 50.</p>
<p>For more information, see Section 17.2.1, “RANGE Partitioning”, and Section 17.2.2, “LIST Partitioning”.</p>
<p>It is now possible to delete all rows from one or more partitions of a partitioned table using the ALTER TABLE &#8230; TRUNCATE PARTITION statement. Executing the statement deletes rows without affecting the structure of the table. The partitions named in the TRUNCATE PARTITION clause do not have to be contiguous.</p>
<p>Key caches are now supported for indexes on partitioned MyISAM tables, using the CACHE INDEX and LOAD INDEX INTO CACHE statements. In addition, a key cache can be defined for and loaded with indexes from an entire partitioned table, or for one or more partitions. In the latter case, the partitions are not required to be contiguous.</p>
<p>The TO_SECONDS() function is added. This function converts a date or datetime expression to a number of seconds since the year 0. You may use this function in partitioning expressions, and partition pruning is supported for tables defined using such expressions.</p>
<p>The following constructs are obsolete and have been removed in MySQL 5.5. Where alternatives are shown, applications should be updated to use them.</p>
<p>The log_bin_trust_routine_creators system variable (use log_bin_trust_function_creators).</p>
<p>The myisam_max_extra_sort_file_size system variable.</p>
<p>The record_buffer system variable (use read_buffer_size).</p>
<p>The sql_log_update system variable.</p>
<p>The table_type system variable (use storage_engine).</p>
<p>The FRAC_SECOND modifier for the TIMESTAMPADD() function.</p>
<p>The TYPE table option to specify the storage engine for CREATE TABLE or ALTER TABLE (use ENGINE).</p>
<p>The SHOW TABLE TYPES SQL statement (use SHOW ENGINES).</p>
<p>The SHOW INNODB STATUS and SHOW MUTEX STATUS SQL statements (use SHOW ENGINE INNODB STATUS SHOW ENGINE INNODB MUTEX).</p>
<p>The SHOW PLUGIN SQL statement (use SHOW PLUGINS).</p>
<p>The LOAD TABLE &#8230; FROM MASTER and LOAD DATA FROM MASTER SQL statements (use mysqldump or mysqlhotcopy to dump tables and mysql to reload dump files).</p>
<p>The BACKUP TABLE and RESTORE TABLE SQL statements (use mysqldump or mysqlhotcopy to dump tables and mysql to reload dump files).</p>
<p>TIMESTAMP(N) data type: The ability to specify a display width of N (use without N).</p>
<p>The &#8211;default-character-set and &#8211;default-collation server options (use &#8211;character-set-server and &#8211;collation-server).</p>
<p>The &#8211;delay-key-write-for-all-tables server option (use &#8211;delay-key-write=ALL).</p>
<p>The &#8211;enable-locking and &#8211;skip-locking server options (use &#8211;external-locking and &#8211;skip-external-locking).</p>
<p>The &#8211;log-bin-trust-routine-creators server option (use &#8211;log-bin-trust-function-creators).</p>
<p>The &#8211;log-long-format server option.</p>
<p>The &#8211;log-update server option.</p>
<p>The &#8211;master-xxx server options to set replication parameters (use the CHANGE MASTER TO statement instead): &#8211;master-host, &#8211;master-user, &#8211;master-password, &#8211;master-port, &#8211;master-connect-retry, &#8211;master-ssl, &#8211;master-ssl-ca, &#8211;master-ssl-capath, &#8211;master-ssl-cert, &#8211;master-ssl-cipher, &#8211;master-ssl-key.</p>
<p>The &#8211;safe-show-database server option.</p>
<p>The &#8211;skip-symlink and &#8211;use-symbolic-links server options (use &#8211;skip-symbolic-links and &#8211;symbolic-links).</p>
<p>The &#8211;sql-bin-update-same server option.</p>
<p>The &#8211;warnings server option (use &#8211;log-warnings).</p>
<p>The &#8211;no-named-commands option for mysql (use &#8211;skip-named-commands</p>
<p>The &#8211;no-pager option for mysql (use &#8211;skip-pager).</p>
<p>The &#8211;no-tee option for mysql (use &#8211;skip-tee).</p>
<p>The &#8211;position option for mysqlbinlog (use &#8211;start-position).</p>
<p>The &#8211;all option for mysqldump (use &#8211;create-options).</p>
<p>The &#8211;first-slave option for mysqldump (use &#8211;lock-all-tables).</p>
<p>The &#8211;config-file option for mysqld_multi (use &#8211;defaults-extra-file).</p>
<p>The &#8211;set-variable=var_name=value and -O var_name=value general-purpose options for setting program variables (use &#8211;var_name=value).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2010/10/features-added-to-mysql-5-5/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Efficient Use of concat_ws in Mysql query</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2010/09/efficient-use-of-concat_ws-in-mysql-query/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2010/09/efficient-use-of-concat_ws-in-mysql-query/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 05:31:45 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=28</guid>
		<description><![CDATA[Try out the following queries with concat_ws function: many a times we need to display Concatenated values for First name and Last name OR concatenated values for Addresses; generally we do this in PHP but that can be done in MySQL query as well Syntax: concat_ws(‘separator’, ‘string 1’, ‘string 2’, ‘string 3’, ‘string N’, ……)]]></description>
			<content:encoded><![CDATA[<p>Try out the following queries with concat_ws function: many a times we need to display Concatenated values for First name and Last name OR concatenated values for Addresses; generally we do this in PHP but that can be done in MySQL query as well</p>
<p>Syntax: concat_ws(‘separator’, ‘string 1’, ‘string 2’, ‘string 3’, ‘string N’, ……)</p>
<p>1) SELECT concat_ws(&#8216; &#8216;, &#8216;Firstname&#8217;, &#8216;Lastname&#8217;)</p>
<p>2) SELECT concat_ws(&#8216;&lt;br&gt;&#8217;, &#8216;Address Line 1&#8242;, &#8216;Address Line 2&#8242;, &#8216;City&#8217;, &#8216;State&#8217;, &#8216;Zipcode&#8217;, &#8216;Country&#8217;)</p>
<p>3) SELECT concat_ws(&#8216;&lt;br&gt;&#8217;, &#8216;Address Line 1&#8242;, NULL, &#8216;City&#8217;, &#8216;State&#8217;, &#8216;Zipcode&#8217;, &#8216;Country&#8217;)</p>
<p>Note: in the above examples, instead of the string values you can use the table fields.</p>
<p>IMPORTANT POINT to be noted in Query #3 is that if any of the values in the list is having NULL value then this function simply ignores that value and returns the concatenated values of the remaining values with the separator specified; in the above example the separator is a string ‘&lt;br&gt;’ it can be any string.</p>
<p>Where and how to use:</p>
<p>SELECT concat_ws(&#8216; &#8216;, u.firstname, u.lastname) as fullname, concat_ws(&#8216;&lt;br&gt;&#8217;, u.address1, u.address2, u.city_name, s.state_name, u.zipcode, c.country_name) as fulladdress</p>
<p>FROM user_master u</p>
<p>LEFT JOIN state_master s ON s. state_id =u. state_id</p>
<p>LEFT JOIN country_master c ON c. countries_id =u.countries_id</p>
<p>WHERE user_id = ‘123’</p>
<p>Isn’t it helpful???</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2010/09/efficient-use-of-concat_ws-in-mysql-query/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Why to use MySQL as database?</title>
		<link>http://www.nileshpawar.com/mysqlexpert/2010/03/why-to-use-mysql-as-database/</link>
		<comments>http://www.nileshpawar.com/mysqlexpert/2010/03/why-to-use-mysql-as-database/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 10:12:01 +0000</pubDate>
		<dc:creator>Nilesh Pawar</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.nileshpawar.com/mysqlexpert/?p=26</guid>
		<description><![CDATA[1. High Performance A unique storage-engine architecture allows database professionals to configure the MySQL database server specifically for particular applications, with the end result being amazing performance results. Whether the intended application is a high-speed transactional processing system or a high-volume web site that services a billion queries a day, MySQL can meet the most]]></description>
			<content:encoded><![CDATA[<p>1. High Performance<br />
A unique storage-engine architecture allows database professionals to configure the MySQL database server specifically for particular applications, with the end result being amazing performance results. Whether the intended application is a high-speed transactional processing system or a high-volume web site that services a billion queries a day, MySQL can meet the most demanding performance expectations of any system. With high-speed load utilities, distinctive memory caches, full text indexes, and other performance-enhancing mechanisms, MySQL offers all the right ammunition for today&#8217;s critical business systems.</p>
<p>2. Scalability and Flexibility<br />
The MySQL database server provides the ultimate in scalability, sporting the capacity to handle deeply embedded applications with a footprint of only 1MB to running massive data warehouses holding terabytes of information. Platform flexibility is a stalwart feature of MySQL with all flavors of Linux, UNIX, and Windows being supported. And, of course, the open source nature of MySQL allows complete customization for those wanting to add unique requirements to the database server.</p>
<p>3. Robust Transactional Support<br />
MySQL offers one of the most powerful transactional database engines on the market. Features include complete ACID (atomic, consistent, isolated, durable) transaction support, unlimited row-level locking, distributed transaction capability, and multi-version transaction support where readers never block writers and vice-versa. Full data integrity is also assured through server-enforced referential integrity, specialized transaction isolation levels, and instant deadlock detection.</p>
<p>4. High Availability<br />
Rock-solid reliability and constant availability are hallmarks of MySQL, with customers relying on MySQL to guarantee around-the-clock uptime. MySQL offers a variety of high-availability options from high-speed master/slave replication configurations, to specialized Cluster servers offering instant failover, to third party vendors offering unique high-availability solutions for the MySQL database server.</p>
<p>5. Web and Data Warehouse Strengths<br />
MySQL is the de-facto standard for high-traffic web sites because of its high-performance query engine, tremendously fast data insert capability, and strong support for specialized web functions like fast full text searches. These same strengths also apply to data warehousing environments where MySQL scales up into the terabyte range for either single servers or scale-out architectures. Other features like main memory tables, B-tree and hash indexes, and compressed archive tables that reduce storage requirements by up to eighty-percent make MySQL a strong standout for both web and business intelligence applications.</p>
<p>6. Strong Data Protection<br />
Because guarding the data assets of corporations is the number one job of database professionals, MySQL offers exceptional security features that ensure absolute data protection. In terms of database authentication, MySQL provides powerful mechanisms for ensuring only authorized users have entry to the database server, with the ability to block users down to the client machine level being possible. SSH and SSL support are also provided to ensure safe and secure connections. A granular object privilege framework is present so that users only see the data they should, and powerful data encryption and decryption functions ensure that sensitive data is protected from unauthorized viewing. Finally, backup and recovery utilities provided through MySQL and third party software vendors allow for complete logical and physical backup as well as full and point-in-time recovery.</p>
<p>7. Comprehensive Application Development<br />
One of the reasons MySQL is the world&#8217;s most popular open source database is that it provides comprehensive support for every application development need. Within the database, support can be found for stored procedures, triggers, functions, views, cursors, ANSI-standard SQL, and more. For embedded applications, plug-in libraries are available to embed MySQL database support into nearly any application. MySQL also provides connectors and drivers (ODBC, JDBC, etc.) that allow all forms of applications to make use of MySQL as a preferred data management server. It doesn&#8217;t matter if it&#8217;s PHP, Perl, Java, Visual Basic, or .NET, MySQL offers application developers everything they need to be successful in building database-driven information systems.</p>
<p>8. Management Ease<br />
MySQL offers exceptional quick-start capability with the average time from software download to installation completion being less than fifteen minutes. This rule holds true whether the platform is Microsoft Windows, Linux, Macintosh, or UNIX. Once installed, self-management features like automatic space expansion, auto-restart, and dynamic configuration changes take much of the burden off already overworked database administrators. MySQL also provides a complete suite of graphical management and migration tools that allow a DBA to manage, troubleshoot, and control the operation of many MySQL servers from a single workstation. Many third party software vendor tools are also available for MySQL that handle tasks ranging from data design and ETL, to complete database administration, job management, and performance monitoring.</p>
<p>9. Lowest Total Cost of Ownership<br />
By migrating current database-drive applications to MySQL, or using MySQL for new development projects, corporations are realizing cost savings that many times stretch into seven figures. Accomplished through the use of the MySQL database server and scale-out architectures that utilize low-cost commodity hardware, corporations are finding that they can achieve amazing levels of scalability and performance, all at a cost that is far less than those offered by proprietary and scale-up software vendors. In addition, the reliability and easy maintainability of MySQL means that database administrators don&#8217;t waste time troubleshooting performance or downtime issues, but instead can concentrate on making a positive impact on higher level tasks that involve the business side of data.</p>
<p>10.Easy support.</p>
<p>You can get easy support from thousands of mysql administrators worldwide. Sometimes for free and sometimes for very negligible charges. You can contact me anytime for mysql support at nilesh.pawar@kaizenz.com or visit at http://www.kaizenz.com</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nileshpawar.com/mysqlexpert/2010/03/why-to-use-mysql-as-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

