Thursday, January 26, 2012

Cleanup Windows winsxs folder...

Recently noticed my Windows 7 C: Drive is almost getting full and found that the folder WINSXS under C:\Windows\ is more than 10GB !!! Its Microsoft Service Pack data that is not being removed after installation is completed. Well, for a reason of-course to un-install later if you decided... In my case, I dont think i will be uninstalling service packs. So, need to clean it up.

Run the following in DOS window it should get you back at least 5GB!!!

DISM /online /Cleanup-Image /SpSuperseded


Friday, January 6, 2012

Netezaa How It Works...

I am really shocked to see the spontaneous response from Netezza database when compared with Oracle.
The query that was taking hours in Oracle takes only few seconds in Natezza!!! That made me to start knowing little more about Netezza and How this thing works!!!

Here is the info that I grabbed from some white paper which is really interesting:

The Netezza Data Warehouse Appliance looks like a refrigerator that rolls into the Data Center. Each rack holds 12.5 Terabytes of storage. For more storage, just chain a bunch of these ‘refrigerator looking’ racks together. We believe it is simpler because all of the pieces are contained in a single rack. Many vendors on the market require a complement of software, hardware, database technology and networking to make their solutions work.


How can it load data at 500 Gigabytes an hour and retrieve data in seconds? It is due to its design.

DESIGN
Each Data Warehouse Appliance (refrigerator) contains (108) computers called Snippet Processing Units (SPU). Each SPU is an integrated circuit board with a CPU, 400-Gigabyte hard disk, memory and 1 Gigabyte Network Interface Card. Each Appliance (cabinet) contains 108 of these SPUs. This offers parallel processing across 108 computers inside of each cabinet.

HOW IT WORKS
As the data is loaded into the Appliance, it intelligently separates each table across the 108 SPUs. Typically, the hard disk is the slowest part of a computer. Imagine 108 of these spinning up at once, loading a small piece of the table. This is how Netezza achieves a 500 Gigabyte an hour load time.

After a piece of the table is loaded and stored on each SPU (computer on an integrated circuit card), each column is analyzed to gain descriptive statistics such as minimum and maximum values. These values are stored on each of the 108 SPUs, instead of indexes, which take time to create, updated and take up unnecessary space. Imagine your environment without the need to create indexes.

When it is time to query the data, a master computer inside of the Appliance queries the SPUs to see which ones contain the data required. Only the SPUs that contain appropriate data return information, therefore less movement of information across the network to the Business Intelligence/Analytics Server.

For joining data, it gets even better. The Appliance distributes data in multiple tables across multiple SPUs by a key. Each SPU contains partial data for multiple tables. It joins parts of each table locally on each SPU returning only the local result. All of the ‘local results’ are assembled internally in the cabinet and then returned to the Business Intelligence/Analytics Server as a query result. This methodology also contributes to the speed story.

The key to all of this is ‘less movement of data across the network’. The Appliance only returns data required back to the Business Intelligence/Analytics server across the organization’s 1000/100 MB network. This is very different from traditional processing where the Business Intelligence/Analytics software typically extracts most of the data from the database to do its processing on its own server. The database does the work to determine the data needed, returning a smaller subset result to the Business Intelligence/Analytics server.

BACKUP AND REDUNDANCY
To understand how the data and system are set up for almost 100% uptime, it is important to understand the internal design. It uses the outer, fastest, one-third part of each 400-Gigabyte disk for data storage and retrieval. One-third of the disk stores descriptive statistics and the other third stores hot data back up of other SPUs. Each Appliance cabinet also contains 4 additional SPUs for automatic fail over of any of the 108 SPUs.

TO_DATE has no TIME in Netezza

When using TO_DATE function in Netezza, please make a note that TO_DATE function does not return (or rather consider) the TIME portion of it. So, if your queries that are normally written in Oracle which filters the data for certain Date and Time will result in wrong set in Netezza database.

Use TO_TIMESTAMP instead to consider the TIME portion also.

For Ex:
Query in Oracle:
select to_date('20120106143029','yyyymmddhh24miss'), to_timestamp('20120106143029','yyyymmddhh24miss') from dual;

TO_DATE                        TO_TIMESTAMP
1/6/2012 2:30:29 PM      1/6/2012 2:30:29.000000000 PM

Query in Netezza:
select to_date('20120106143029','yyyymmddhh24miss'), to_timestamp('20120106143029','yyyymmddhh24miss') from _v_dual;


TO_DATE      TO_TIMESTAMP
1/6/2012      1/6/2012 2:30:29 PM