Thursday, April 1, 2010

Shinguz's Blog moves to company website

Hello dear readers,

I will move my blog to our company website: http://www.fromdual.com/blog

My personal blog you can find as follows: http://www.fromdual.ch/blog/41/feed

Thanks to you all and see us there!

Regards,
Oli (aka Shinguz)

Monday, March 1, 2010

FromDual - The MySQL consulting company goes operational today!

Hello everybody,

One month earlier than planned we have the great pleasure to announce you that the company called FromDual goes operational today!

We are excited about this step and it is an new era in our personal evolution to get back in full-contact with customers and solve their real life day-to-day MySQL problems.

So we are happy hearing from you and to help you solving your individual MySQL problems...

You can find us at FromDual or you can drop us a line.

Regards,
Oli Sennhauser (aka Shinguz)
Senior MySQL Consultant at FromDual

About FromDual

FromDual provides neutral and vendor independent MySQL consulting, training and other services around MySQL and its derivatives. The company concentrates on the individual needs of its customers and achieves, in a close co-operation the best results for their problems.

Our consultants have been working in many projects in Europe. We were involved in small start-ups, medium size enterprises and huge world wide operating top-500 companies and solved their Performance Problems, developed Architecture & Design studies with them, answered their operation questions, and reviewed their Backup/Recovery concepts.

FromDual does on-site and remote consulting, remote emergency aid and helps its customers to fill MySQL staff gaps if needed.

The company is privately owned. Its HQ is close to Zurich in Switzerland.

Monday, February 15, 2010

Logging users to the MySQL error log

Problem

A customer recently showed up with the following problem:
With your guidelines [1] I am now able to send the MySQL error log to the syslog
and in particular to an external log server.
But I cannot see which user connects to the database in the error log.

How can I achieve this?

Idea

During night when I slept my brain worked independently on this problem and in the morning he had prepared a possible solution for it.

What came out is the following:
  • We create an UDF which allows an application to write to the MySQL error log.
    See my previous article about this [2].
  • We specify in a simple SQL query how the string should look which we want to write to the MySQL error log file.
  • We use the init_connect [3] hook (= logon trigger) which MySQL provides to log the information to the error log.

How to solve it?

The UDF can be taken from [4]. Be not confused by the version number. It just worked with MySQL 5.1.42. Load the UDF according to the article into the MySQL database. Follow the little example there and if it works lets continue to the next step.

The SQL query to form the MySQL error log string looks as follows:
  mysql> SELECT CONCAT('[Security] User ', USER(), ' logged in.');
And if executed with the function:
  mysql> SELECT log_error(CONCAT('[Security] User ', USER(), ' logged in.'));
it produces the following output to the MySQL error log file:
  shell> tail -n 1 error.log
100215 17:50:16 [Security] User oli@localhost logged in.
And now make this permanent for every user which does not have SUPER privileges:
  #
# my.cnf
#
[mysqld]
init_connect = 'SELECT log_error(CONCAT("[Security] User ", USER(), " logged in."));'
restart the database and it should work now (it could also work with just SET GLOBAL init_connect=...).

Caution

Please consider the MySQL documentation [3] and be aware of the following:
"Note that the content of init_connect is not executed for users that have the SUPER privilege."
Further I want to warn you that I have NOT tested the impact on stability and performance of this method! Please test it carefully yourself an let me know if you find something or also if it works smoothly for you.

This is part of the MySQL Auditing Package we are currently working on and we hope to finish it soon. If you are interested in this work please let us know and our MySQL consultants are happy to help you implementing your own MySQL auditing in your environment.

Literature

[1] MySQL reporting to syslog
[2] MySQL useful add-on collection using UDF
[3] MySQL documentation: init_connect
[4] UDF collection

Thursday, February 11, 2010

Can you trust your backup?

Today a customer with corrupted data files showed up. When we enquired a bit more he told us that he had a broken I/O controller. This is one of the worst things which can happen to you!

The reason is the following: When a I/O controller starts to die it often does not happen immediately. The controller dies slowly producing more and more corrupt data. When you just write data without checking or reading them it can take days or even weeks until you discover the problem.

But the nasty thing is, that even your backup is infected with the corrupted data. In worst case corruption started long before your oldest still existing backup was made.

Fortunately DBMS are a bit sensitive related to data corruptions and start to complain pretty early. So please consider warning or error messages about data corruption as serious and try to find the problem immediately and solve it!

What can we do against spreading data corruption?

  • Monitor logs (syslog, database error log, application log, etc.).
  • Ideally do physical AND logical backups. If your database is too big for a logical restore you can redirect your logical backup to /dev/null. So you can assure that at least the data can be read and the mysqldump command should complain when you hit data corruption (does not work for index corruption!).
  • Think about a backup retention policy.
  • Think about 2 independent paths to recover your data (keep at least 2 good backups + all the binary logs).
  • Do a check on your backuped files (myisamchk, innochecksum).
  • Test your backup frequently with a restore (ideally on a daily basis)!
Unfortunately this last item is done much to rare by MySQL users and then they experience bad surprises when they have to do a restore once in an emergency!

Have you EVER tested a restore of your backup? Please do! Especially if your data size is significant bigger than your amount of RAM!

In a later article I will show you a concept to do backups and test the restores regularly. Including some positive side effects on your development process.

If you need some more information or help about backup concepts, emergency restore or data recoveries please consider our consulting services.

Friday, January 29, 2010

What is CHECK TABLE doing with InnoDB tables?

Recently we had a case where a customer got some corrupted blocks in his InnoDB tables. His largest tables where quite big, about 30 to 100 Gbyte. Why he got this corrupted blocks we did not find out yet (disk broken?).

When you have corrupted blocks in InnoDB, it is mandatory to get rid of them again. Otherwise your database can crash suddenly.
If you are lucky only "normal" tables are concerned. So you can dump, drop, recreate and load them again as described in the InnoDB recovery procedure in the MySQL documentation [1].
If you are not so lucky you have to recreate your complete database or go back to an old backup and do a restore with a Point-in-Time-Recovery (PITR).

To find out if some tables are corrupted MySQL provides 2 tools: The innochecksum utility [2] and the mysqlcheck utility [3] or you can use the CHECK TABLE command manually (which is used by mysqlcheck).

I wanted to know how CHECK TABLE works in detail. So I looked first in the MySQL documentation [4]. But unfortunately the MySQL documentation does not go into details that much very often on such specific questions.

So I dug into the code. The interesting lines you can find in the files handler/ha_innodb.cc and row/row0mysql.c. In the following snippets I have cut out a lot of detail stuff.

The function ha_innobase::check is the interface between the CHECK TABLE command and the InnoDB storage engine and does the call of the InnoDB table check:

// handler/ha_innodb.cc

int ha_innobase::check( THD* thd )
{

build_template(prebuilt, NULL, table, ROW_MYSQL_WHOLE_ROW);

ret = row_check_table_for_mysql(prebuilt);

if (ret == DB_SUCCESS) {
return(HA_ADMIN_OK);
}

return(HA_ADMIN_CORRUPT);
}

The function row_check_table_for_mysql does the different checks on an InnoDB table:

  • First it checks if the ibd file is missing.

  • Then the first index (dict_table_get_first_index) is checked on its consistency (btr_validate_index) by walking through all page tree levels. In InnoDB the first (primary) index is always equal to the table (= data).

  • If the index is consistent several other checks are performed (row_scan_and_check_index):

    • If entries are in ascendant order.

    • If unique constraint is not broken.

    • And the number of index entries is calculated.

  • Then the next and all other (secondary) indexes of the table are done in the same way.

  • At the end a WHOLE Adaptive Hash Index check for ALL InnoDB tables (btr_search_validate) is done for every CHECK TABLE!

// row/row0mysql.c

ulint row_check_table_for_mysql( row_prebuilt_t* prebuilt )
{

if ( prebuilt->table->ibd_file_missing ) {
fprintf(stderr, "InnoDB: Error: ...", prebuilt->table->name);
return(DB_ERROR);
}

index = dict_table_get_first_index(table);

while ( index != NULL ) {

if ( ! btr_validate_index(index, prebuilt->trx) ) {
ret = DB_ERROR;
}
else {

if ( ! row_scan_and_check_index(prebuilt, index, &n_rows) ) {
ret = DB_ERROR;
}

if ( index == dict_table_get_first_index(table) ) {
n_rows_in_table = n_rows;
}
else if ( n_rows != n_rows_in_table ) {

ret = DB_ERROR;

fputs("Error: ", stderr);
dict_index_name_print(stderr, prebuilt->trx, index);
fprintf(stderr, " contains %lu entries, should be %lu\n", n_rows, n_rows_in_table);
}
}

index = dict_table_get_next_index(index);
}

if ( ! btr_search_validate() ) {
ret = DB_ERROR;
}

return(ret);
}

A little detail which is NOT discussed in the code above is that the fatal lock wait timeout is set from 600 seconds (10 min) to 7800 seconds (2 h 10 min).

/* Enlarge the fatal lock wait timeout during CHECK TABLE. */
mutex_enter(&kernel_mutex);
srv_fatal_semaphore_wait_threshold += 7200; /* 2 hours */
mutex_exit(&kernel_mutex);

As far as I understand this has 2 impacts:
  1. CHECK TABLE for VERY large tables (> 200 - 400 Gbyte) will most probably fail because it will exceed the fatal lock timeout. This becomes more probable when you have bigger tables, slower disks, less memory or do not make use of your memory appropriately.

  2. Because srv_fatal_semaphore_wait_threshold is a global variable, during every CHECK TABLE the fatal lock wait timeout is set high for the whole system. Long enduring InnoDB locks will be detected late or not at all during a long running CHECK TABLE command.


If this is something which should be fixed to get a higher reliability of the system I cannot judge and is up to the InnoDB developers. But when you hit such symptoms during long running CHECK TABLE commands consider this.
For the first finding I have filed a feature request [5]. This "problem" was introduced long time ago with bug #2694 [6] in MySQL 4.0, Sep 2004. Thanks to Axel and Shane for their comments.
If you want to circumvent this situation you have either to recompile MySQL with higher values or you can use the concept of a pluggable User Defined Function (UDF) which I have described earlier [7], [8], [9].

An other detail is that at the end of each CHECK TABLE command a check of all Adaptive Hash Indexes of all tables is done. I do not know how expensive it is to check all Adaptive Hash Indexes, especially when they are large. But having a more optimized code there could help to speed up the CHECK TABLE command for a small percentage?

These information are valid up to MySQL/InnoDB 5.1.41 and the InnoDB plug-in 1.0.5.

Literature

[1] Forcing InnoDB Recovery
[2] innochecksum — Offline InnoDB File Checksum Utility
[3] mysqlcheck
[4] CHECK TABLE
[5] Bug #50723: InnoDB CHECK TABLE fatal semaphore wait timeout possibly too short for big table
[6] Bug #2694: CHECK TABLE for Innodb table can crash server
[7] MySQL useful add-on collection using UDF
[8] Using MySQL User-Defined Functions (UDF) to get MySQL internal informations
[9] MySQL useful add-on collection using UDF

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

Friday, January 22, 2010

MySQL on VMware Workstation/DRBD vs. VMWare ESX Server/SAN

Or an active-active fail-over cluster à la VMware.

Today I have learned about a totally crazy/cool looking architecture where the expensive VMware ESX server was replace by a free/cheap VMware Workstation version in combination with DRBD.

Basically DRBD we name "the little man's SAN" and that is exactly what this customer is doing. He replaced the SAN with DRBD and now he can easily move one VMware instance to the other host. Possibly it is not that flexible and powerful as an ESX Server but also not so expensive...

The architecture looks as follows:


According to this customer it works stable on about a dozed of installations and they have not experienced any troubles during the fail-overs.

Please let me know your experience, thoughts or concerns with this architecture...

PS: When you consider such an architecture do not expect a very good performance!

Wednesday, January 20, 2010

The battle against Oracle is probably over but has the real war begun yet?

According to different sources from the web the decision about the Oracle - Sun merger has been approved by the European commission soon. So at least in the West it is clear what is going on. Let us see what the East decides... [1], [2].

Oracles arch-enemy Microsoft has already brought its weapons in position against the target with its: "Microsoft offers Oracle-phobes MySQL migration tool" [3], [4]. So far so good. Nothing new, nothing special.

What made me a bit edgy was the following Oracle blog series about their Oracle Warehouse Builder (OWB):
  • OWB 11gR2 – MySQL Open Connectivity [5]
  • OWB 11gR2 – MySQL Bulk Extract [6]
OWB seems to be a great tool to move data around from different sources, to mix them and to extract some useful results.

It looks like with the new 11gR2 release there "... were significant changes to mapping to support native heterogeneous connectivity to systems ...".

What interests me more is what is MySQL related about it. In the second part of the series I find some text passages like:

"Next on the MySQL series, let's look at bulk extract to file from MySQL. ... and also rather than just unloading to file, we can optimize for other systems such as Oracle and create Load Code Templates that extract in bulk from MySQL (or whatever) transfer to the Oracle system and create an external table as the staging table. ... Hopefully this gives you a taster for the capabilities of the bulk extract capabilities using MySQL as an illustration."

This makes me just a bit nervous. But hopefully I am just overreacting because of the current situation... So let us carefully watch how it continues!

PS: .oO(How would it sound like this: "... and also rather than just unloading Oracle data to file, we can optimize for MySQL and create Load Code Templates that extract in bulk from Oracle (or whatever) transfer to the MySQL system ..."?)

1 Yahoo! NEWS
2 The Wall Street Journal
3 Microsoft offers Oracle-phobes MySQL migration tool
4 Free Download: Microsoft SQL Server Migration Assistant
5 OWB 11gR2 – MySQL Open Connectivity
6 OWB 11gR2 – MySQL Bulk Extract