Block Level Compression Teradata 14

Page 1

Block Level Compression In Teradata 14.0

Block Level Compression In Teradata 14.0 Introducing Temperature Based Compression

By: Smarak Das Employee Id: 391485 Project: Kaiser Permanente Role: Teradata DBA 1


Block Level Compression In Teradata 14.0

1. INTRODUCTION Two key advantages of data compression include: (a) Less disk space required to physically store the data (b) Improved throughput when reading or writing to disk Teradata 13.10 introduced new ways to compress data in the database, bringing to 3 the total approaches possible: (a) Multi Value Compression (MVC) (b) Algorithmic Level Compression (ALC) (c) Block Level Compression (BLC) This Document focuses on Block Level Compression (BLC) and its enhanced functionalities introduced in Teradata 14.

2. AUDIENCE The targeted audience for this document is experienced Teradata Database administrators, application implementers, and Teradata associates who have a background in database management and implementation. This Document has been preapred by referencing several Teradata Guides and Orange Books.

2


Block Level Compression In Teradata 14.0 3. BLOCK LEVEL COMPRESSION BASICS Block level compression is a space-reduction feature, not a performance feature. All rows in each data block are included in a single act of compression. Once a table or some number of cylinders from a table are compressed, if a query needs to access even one column from a single row, the data block in its entirety must be decompressed. In order for the query to process the data, the compressed version of the data block will be brought into the FSG cache (the data block memory cache), and then decompressed into its original version. Even though one query decompresses a data block, the next query that accesses that same data block must decompress the data block again, even if the compressed version of the block stays in the cache. Decompressed data blocks cannot be shared across sessions. In addition, if the same session re-accesses a compressed data block, it will have to decompress the data block a second time.

When a row is modified, its entire data block must first be decompressed in memory. The change is made to the appropriate row, and then the entire data block is compressed once again. If fallback has been defined on the table, the fallback copy will also undergo decompression followed by recompression to accommodate update activity. Because there is overhead associated with these compression and decompression steps, BLC is typically targeted for tables that are infrequently accessed and even less frequently changed. These types of tables are often referred to as “cold data.�

3


Block Level Compression In Teradata 14.0 Block Level Compression can be applied as follows: (a) Empty Table: Using Query Band options or appropriate DBS Control Settings. (b) Non-Empty Table: Using Ferret Utility. Teradata currently uses the ZLIB library, based on the Lempel-Ziv algorithm for all software block level compression.

4. SPACE REDUCTION FUNDAMENTALS The primary purpose of block level compression is to reduce the space on disk required to physically store a table. It is strongly recommended that you use the maximum allowed block size (127.5 KB) for tables that will be compressed. This can be done as either a table level attribute (with CREATE/ALTER TABLE) or DBS Control’s File System. The figure below illustrates data block packing that happens during the compression process:

Factors affecting the Degree of Compression: (a) Individual data blocks within a table may have different reduction rates. (b) If MVC or ALC is defined on a table when it is compressed at the block level, then the degree of reduction from adding BLC will be less than it would have been if preexisting compression had not been present, but overall the compression level will be increased. (c) Fallback secondary index sub tables may be compressed, but primary secondary index data is not compressed. So if your table contains indexes before compression, the post-compression reduction rate will appear to be smaller than if it had no indexes at the time of compression. 4


Block Level Compression In Teradata 14.0 (d) Some data blocks are considered too small to provide any meaningful compression, and those blocks will be bypassed during the compression process. (e) The patterns of data within the data block will impact how effectively compression can be applied.

5. Using Manual BLC as it Exists in Teradata Database 13.10 Block level compression can be costly, particularly when it is applied to all data blocks of the primary and fallback copy of a table, as it is in Teradata 13.10. The three major costs attributed to compression come from:  Applying the initial compression (a one-time cost)  Uncompressing the data blocks when they are accessed  Recompressing the data blocks when they are modified The best candidates for block level compression are tables whose data blocks do not have to be decompressed very often. These would be tables that hold cold data (very infrequently accessed data) that are rarely or never updated (static tables). If actively-accessed tables are being considered for manual compression, then BLC will be a better fit for platforms that have plenty of excess CPU, to account for the additional overhead involved. Because of the increase in CPU required for decompression on actively-accessed tables, BLC applied on non-cold data is usually not recommended for environments that have high CPU utilization (80% or greater). The next chapter provides examples that illustrate this CPU overhead.

6. Temperature Based Block Level Compression When considering block level compression on a table, a new enhancement makes it possible to only compress data that is infrequently accessed, and leave the more frequently used data blocks uncompressed. Temperature based block level compression may be applied on a table-by-table basis, or by default on all user tables. When temperature based BLC is enabled, the Teradata Virtual Storage (TVS) infrastructure is used to determine the temperature of the data being held in disk storage. Temperature is an indication of how often the data is accessed. A cold temperature indicates the data is accessed less often than other data on the system, whereas a hot temperature indicates the data is accessed more often. This temperature determination performed by TVS is used. 5


Block Level Compression In Teradata 14.0

6.1 SETTING THE THRESHOLD FOR COMPRESSION One of the tasks performed by the TVS routines rates all cylinders by relative temperature, and then determines which cylinders are to be marked as cold. The threshold that determines compression depends on a DBS Control setting that is found in the Compression set of fields: TempBLCThresh = COLD (COLD, WARM, HOT, 0 – 100) Zero specifies the default, which is Cold. The lowest threshold that you can select is .001. The TempBLCThresh setting indicates the temperature threshold at which data will be compressed, when under the control of temperature based BLC. Any data that is colder than what the setting specifies will be targeted for compression. Here is what the different options mean:  COLD: By default, the lowest 20% of the allocated perm cylinders on the platform (this excludes spool tables), once they have been sorted by temperature (as described above), will be labeled as cold. If this option is selected, the 20% of cylinders (by default) with the lowest relative temperature will be compressed.  WARM: When specifying WARM, then all of the data cylinders that have been classified as both WARM and COLD will be compressed. Only the hot cylinders will remain uncompressed  HOT: Selecting this option will effectively compress all data on the system. If that is the desired result, the same result can be achieved with less overhead by avoiding use of the temperature based feature, and instead using manual block level compression. There is no current valid use case for this option. 6


Block Level Compression In Teradata 14.0  0 – 100: This option allows you to determine where you would like the threshold between compressed and non-compressed storage to be. The number that you specify here, which can be up 3 decimal places, will be applied to the sorted list of cylinder temperatures. For example, if you want 30% of your allocated cylinders (the coldest 30%) to be compressed, you could assign 30 for this parameter. Specifying a percent will only affect the Cold-Warm boundary, because only that boundary is considered when deciding the compression status of a cylinder.

Sequence of Activities performed as per Temperature-Based Block Level Compression: (a) TVS keeps track of access counts for each cylinder and periodically re-evaluates the access counts. AutoTempComp Background task of BLC will take the information provided by the TVS routines and determine which cylinders should be compressed and which should be uncompressed. The task, as is true with all file system activity, runs independently on each AMP. This means that until a steady state is reached across the system, different portions of the same table will be compressed or not compressed on different AMPs. (b) Access counts are fed into a formula which assigns a number for each cylinder that represents its relative temperature compared to other cylinders. This list of numbers has a range into the billions and is capable of representing a high degree of differentiation between cylinders. 7


Block Level Compression In Teradata 14.0 (c) This list of numerically-based cylinder temperatures are sorted low to high, and two boundaries are set based on pre-defined thresholds: 1.) The Cold-Warm boundary; and 2.) The Warm-Hot boundary. The boundary that demarcates cold data from warm data is most relevant with temperature based BLC because that boundary separates cylinders that need to be compressed (or just left compressed) and cylinders that need to be uncompressed (or just left uncompressed). (d) Each data block I/O, whether a read or a write, will increment the temperature counter for that cylinder. These access counts feed into the temperature rating algorithm.

8


Block Level Compression In Teradata 14.0 6.2

AUTOCYLPACK

A separate background task called AutoCylPack will come along behind AutoTemp-Comp and will attempt to combine cylinders of recently compressed data blocks. This will only happen if a DBS Control File System parameter called AutoCylPack-Colddata is changed to TRUE. AutoCylPack-Colddata has a default of FALSE. Setting the default in this manner is intended to prevent AutoCylPack from reading cold cylinders, and inadvertently heating them up. However, the benefit of quick consolidation of compressed data blocks on to fewer cylinders will in most cases outweigh the risk of heating up the cylinders for temperature based compression.

9


Block Level Compression In Teradata 14.0 7. Additional Considerations Using Temperature Based BLC Consider these additional points having to do with Temperature -Based BLC: (a) Determining whether a cylinder is hot, warm or cold is based on statistics that are collected by the Teradata Virtual Storage (TVS) routines. Temperature is a relative value, not an absolute one. If you access half a table’s data each day, and the other half of the data every other day, the second group would be labeled colder than the first group, even though it is processed frequently. (b) When you first enable temperature based BLC on a non-TVS system, compression and decompression activity will not begin immediately. When first enabled, the TVS routines have to collect metrics for a period of time. It may be as long as a week before any meaningful metrics is produced. (c) Metrics will need to be collected over a long enough period of time to accurately represent most of the query patterns on a system. In order to achieve a fair view of activity on your system, consider enabling temperature based compression as a single first step, and run that way for a month, without specifying any tables for temperature based compression. (d) Compressing data could actually make the data have a warmer temperature than when it was uncompressed. This is due to the fact that after compression and cylinder consolidation, each cylinder contains more data than it previously had. Because temperature is measured at the cylinder level, a greater number of data blocks could mean a higher cylinder access count.

(e)

If you wish to disable automatic compression, but still be able to compress tables manually, without consideration of temperature, you can simply turn off the master parameter that controls the temperature based compression functionality: EnableTempBLC = FALSE.

One side-effect to be aware of is that some tables will be left in a mixed state, with some of the cylinders compressed, and other cylinders not. This is not necessarily a problem that needs to be fixed. 10


Block Level Compression In Teradata 14.0 8. CONCLUSION Block level compression on all the primary data blocks (as well as all the fallback data blocks, if present) has the potential for providing the greatest space reduction, but it is also the most costly in terms of decompression overhead if compressed tables are accessed or updated. Use block level compression on primary data when one or more of these situations exists:  The platform is extremely space-constrained, and achieving the maximum degree of compression is a priority  MVC combined with ALC does not offer enough size reduction  The database contains large tables which are: (a) Infrequently accessed and rarely updated (the optimal choice for BLC) (b) Frequently accessed or updated  AND lots of spare CPU cycles exist (CPU is <=60% utilized)  OR very strong workload management techniques restrict the CPU used for decompression

When to Use Temperature Based Block Level Compression: Temperature based compression is not suited for all tables. Tables that are historical and infrequently accessed are better defined in manual mode. Temperature based compression is most appropriate in these cases:  Large tables that experience frequent access to a subset of the data.  Tables that remains relatively stable in their temperature profile over time.  On large partitioned primary index (PPI) tables where most access activity is concentrated in the current partitions, but older partitions are relatively dormant.  On columnar tables, due to the clustering of frequently accessed data. If adequate amounts of space are not being saved using temperature based compression, consider raising the compression temperature threshold, which will increase the percent of cylinders that qualify for compression. Another alternative is to make a broader list of tables eligible for temperature based compression. 11


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.