Teradata Database
Fine Print • Nothing in this presentation constitutes a commitment to deliver any specific functionality at any specific time.
2
Key Features
What is a Temporal Database Definitions • Temporal – the ability to store all historic states of a given set of data (a database row), and as part of the query select a point in time to reference the data. Examples: > What was this account balance (share price, inventory level, asset value, etc) on this date? > What data went into the calculation on 12/31/05, and what adjustments were made in 1Q06? > On this historic date, what was the service level (contract status, customer value, insurance policy coverage) for said customer?
• Three Types of Temporal Tables > Valid Time Tables – When a fact is true in the modeled reality – User specified times
> Transaction Time Tables – When a fact is stored in the database – System maintained time, no user control
> Bitemporal Tables 4
– Both Transaction Time and Valid Time
• User Defined Time > User can add time period columns, and take advantage of the added temporal operators > Database does not enforce any rules on user defined time columns
6
Temporal Update – BiTemporal Table Current valid time, current transaction time Query Jeans (125,102) are sold today (200508-30) With Temporal Support UPDATE objectlocation
Without Temporal Support
SET LOCATION = ‘External’ WHERE item_id = 125 AND item_serial_num = 102
7
Moving Current Date in PPI • Description > Support use of CURRENT_DATE and CURRENT_TIMESTAMP built-in functions in Partitioning Expression. > Ability to reconcile the values of these built-in functions to a newer date or timestamp using ALTER TABLE. – Optimally reconciles the rows with the newly resolved date or timestamp value. – Reconciles the PPI expression.
• Benefit > Users can define with ‘moving’ date and timestamps with ease instead of manual redefinition of the PPI expression using constants. – Date based partitioning is typical use for PPI. If a PPI is defined with ‘moving’ current date or current timestamp, the partition that contains the recent data can be as small as possible for efficient access.
> Required for Temporal semantics feature – provides the ability to define ‘current’ and ‘history’ partitions.
8
Time Series Expansion Support • Description > New EXPAND ON clause added to SELECT to expand row with a period column into multiple rows – EXPAND ON clause allowed in views and derived tables
> EXPAND ON syntax supports multiple ways to expand rows • Benefit > Permits time based analysis on period values
9
– Allows business questions such as ‘Get the month end average inventory cost during the last quarter of the year 2006’ – Allows OLAP analysis on period data
> Allows charting of period data in an excel format > Provides infrastructure for sequenced query semantics on Temporal tables
Time series Expansion support • What will it do? > Expand a time period column and produce value equivalent rows one each for each time granule in the period – Time granule is user specified
10
– Permits a period representation of the row to be changed into an event representation
> Following forms of expansion provided: – Interval expansion –
By the user specified intervals such as INTERVAL ‘1’ MONTH
– Anchor point expansion – By the user specified anchored points in a time line
– Anchor period expansion – By user specified anchored time durations in a time line
Geospatial Enhancements • Description 11
>
Enhancements to the Teradata 13 Geospatial offering drastically increasing performance, adding functionality and providing integration points for partner tools
• Benefits > Increased performance by changing UDF’s to Fast Path System functions > Replace the Shape File Generator client tool (org2org) with a stored procedure for tighter integration with the database and tools such as ESRI ARCGIS > Provide geodetic distance methods – SphericalBufferMBR() > WFS Server provides better tool integration support for MapInfo and ESRI products
12
ESRI ArcGIS Connecting to Teradata via Safe Software FME 1.
FME conn ec tion in ArcV iew 2. C onn ect to Tera da ta v ia T PT API 4. S elec t Tera d a ta ta bl es for ArcV iew ana ly s is
13
Projection of Impact Zone & Storm Path to Google Earth Whe re do I de pl oy my ca t m anage m en t tea m.
14
Algorithmic Compression • Description > Provide the capability that will allow users the option of defining compression/decompression algorithms that would be implemented as UDFs and that would be specified and applied to data at the column level in a row. Initially, Teradata will provide two compression/decompression algorithms; one set for UNICODE columns and another set for LATIN columns. • Benefit > Data compression is the process by which data is encoded so that it consumes less physical storage space. This capability reduces both the overall storage capacity needs and the number of physical disk I/Os required for a given operation. Additionally, because less physical data is being operated on there is the potential to improve query response time as well. • Considerations 15
> At some point, compressed data will have to be decompressed when required. This can cause the use of some extra CPU cycles but in general, the advantages of compression outweigh the extra cost of decompression.
16
Multi- Va lue C om pre ss ion For Varc ha r C olumn s • E x a m p le – Mu lt i- V a lu e C o m p r e s s io n fo r V a r c h a r C ol u m n :
CREATE TABLE Customer (Customer_Account_Number INTEGER ,Customer_Name VARCHAR(150) COMPRESS (‘Rich’,‘Todd’) ,Customer_Address CHAR(200));
17
Block Level Compression • Description > Feature provides the capability to perform compression on whole data blocks at the file system level before the data blocks are actually written to storage. • Benefit > Block level compression yields benefit by reducing the actual storage required for storing the data, especially cool/cold data, and significantly reduce the I/O required to read the data. • Considerations > There is a CPU cost to perform the act of compression or decompression on whole data blocks and is generally considered a good trade since CPU cost is decreasing while I/O cost remains high.
18
User-Defined SQL Operators • Description > This feature provides the capability that will allow users to define and encapsulate complex SQL expressions into a User Defined Function (UDF) database object. • Benefits > The use of the SQL UDFs Feature allows users to define their own functions written using SQL expressions. Previously, the desired SQL expression would have to be written into the query for each use or alternatively, an external UDF could have been written in another programming language to provide the same capability. > Additionally, SQL UDFs allow one to define functions available in other databases and with alternative syntax (e.g. ANSI). • Considerations > The Teradata SQL UDF feature is a subset of the SQL function feature described in the ANSI SQL:2003 standard. 19
> Additionally, this feature does not introduce any changes to the definition of the Dictionary Tables per se, but will add additional rows into the DBC.TVM and DBC.UDFInfo tables to indicate the presence of a SQL UDF.
SQL UDF - Example • The “Months_Between” Function: CREATE FUNCTION Months_Between (Date1 DATE, Date2 DATE) RETURNS Interval Month (4) LANGUAGESQL
20
DETERMINISTIC CONTAINS SQL PARAMETER STYLE SQL RETURN(CAST(Date1 AS DATE)SELECT MONTHS_BETWEEN ('2008-01-01', '2007-0101'); MONTHS_BETWEEN ('2008-01-01', '2007-01-01') CAST(Date2 AS DATE)) MONTH (4);
21
--------------------------------------------------12
22
Performance
Character-Based PPI (CPPI) • Description > This feature leverages current Teradata Primary Partitioned Index (PPI) technology and extends this capability to allow the use of character data (CHAR, VARCHAR, GRAPHIC, VARGRAPHIC) as table partitioning mechanisms. • Benefit > Currently, only an integer datatype is allowed to be used in a PPI scheme as a partitioning mechanism which facilitates superior query performance advantage via partition elimination. The extension of this capability to use character-based datatypes as a partitioning mechanism will allow for more partitioning options and in-turn yield similar query performance advantage as the current PPI technology gleans today. • Considerations 19
> As with all Teradata indexes or partitioning database design choices, the Optimizer will determine the appropriate index/PPI to use that will provide the best-cost plan for executing the query. No end-user query modification is required.
Timestamp Partitioning •
Description
> Provide the capability that allows users to explicitly specify a time zone for PPI tables involving DateTime partitioning expressions in order to make the expressions deterministic (e.g., not dependent on the session time zone). > Implement the enhancements that will extend the PPI partition elimination capability to include timestamp data types in partitioning expressions. •
Benefit > Insuring that DateTime partitioning expressions to be deterministic will eliminate the possibility of any errors that may occur as a result of incorrect dependence on session time zones.
> The extension of this capability to use timestamp data types as a partitioning mechanism will allow for more partitioning options and in-turn yield similar query performance advantage as the current PPI technology gleans today. •
Considerations > Enhancements related to deterministic time zone handling will also be applied to sparse join index search conditi20ons as well.
Fastpath Functions • Description > The Fastpath Function project combines the extensibility, short development cycles, and ease-of-use aspects of UDFs with the high performance and ease-of-use aspects of Teradata system functions to yield and alternate development path by which Teradata Engineering software developers may add new Teradata system functions to the Teradata server. • Benefit 21
> The Fastpath Function project will allow Teradata to use a shorter development cycle to fulfill many customer specific requests for implementing new system functions that additionally perform in the same manner as native Teradata system functions. • Considerations > Source code and/or libraries used in the development of Teradata system functions must be solely managed and maintained by Teradata Engineering. End-users will not be able to develop Fastpath system functions.
FastExport – Without Spooling •
Description > Enhance the FastExport utility to provide an option that would allow the utility to execute 28
the FastE xp ort uti lity. I t is a
in a mode that eliminates the requirement that the query data be spooled prior to the actual export process. •
Benefit > The “direct without spooling” method will provide the mechanism to extract data from Teradata table quickly and efficiently with the main benefit being realized as a performance gain and minimum resource utilization.
•
Considerations > The “direct without spooling” method is not transparent to the user and must be specified as a discrete option when executing
Teradata Workload Management
30
TASM: Additional Workload Definitions •
Description > Feature increases the number of available TASM Workload Definitions (WDs) to 250 (instead of 40).
•
Benefits > Complex mixed workloads require the ability to have a finer degree of granular control over the parts of the workload.
Increasing the number of WDs will allow customers to better ď ľ manage and report on resource usage of their system to meet either subject area (e.g. by country, application or division) resource distribution requirements, or category-of-work (e.g. high vs. low priority) resource distribution requirements. ď ľ
32
TASM: Common Classifications • Description > This feature provides for capability to have Workload Definition classification criteria be available for Teradata Workload Management Category 1, 2 and 3 (Filters, System Throttles and Workload Definitions) and additionally, extends wildcard support to Filters and Throttles. • Benefit > The implementation of Common Classifications addresses the differences and delivers consistency between the TDWM categories (Filters, System Throttles and Workload Definitions), which improves the Teradata Workload Management user interface and it’s subsequent usability. • Considerations > Consideration should be given to re-evaluating the current settings for the different categories insofar as common classification 33
extends the ability to manage a workload in an easier and simpler fashion.
TASM: Common Classifications • “Who” Criteria > Account String / Account Name > Teradata Username / Teradata Profile > Application Name > Client Address or Client Name > QueryBand • “Where” Criteria (Data Objects) > Databases > Tables / Views / Macros > Stored Procedures 34
• “What” Criteria > Statement Type (SELECT, DDL, DML) > Utility Type > AMP Limits, Row Count, Final Row Count > Estimated Processing (CPU time) > Join Types – ALL or no joins – ALL or no product joins – ALL or no unconstrained product joins
TASM Utility Management • Description > This feature enhances the TASM utility to augment the existing TD Utility Management capability to provide controls to be similar to the workload management of regular SQL requests and to provide for the automatic selection of the number of sessions used by Teradata utilities.
• Benefits > Feature provides for more granular and centralized control of utility execution and allows deployment to a much wider audience of users and applications. Additionally, the use of Teradata utility sessions is moved inside the database and is automated to eliminate the detailed management of sessions in each job. • Considerations > Consideration should be given to a reevaluation of current rule sets and settings to maximize control of the workload and relative utility execution. > Throttling in TASM eliminates need for Tenacity and Sleep. Execution of queued jobs becomes FIFO. Execution of queued jobs is immediate when resource available rather than at27end of Sleep time”
36
TASM Utility Session Configuration Rules • For FastLoad, MultiLoad, and FastExport utilities, the DBS default for number of AMP sessions is one per AMP. • On a large system with hundreds or thousands of AMPs, this default becomes inappropriate. • Currently, a user can override this default by changing individual load/export script, or changing the MAXSESS parameter in the configuration file, or specifying through runtime parameters (i.e., MAXSESS or –M). • These overriding methods are inconvenient. • This feature allows a DBA to define TDWM rules in one central place that specifies the number of AMP sessions to be used based a combination of the following criteria: > Utility Name
> “Who” criteria (user, account, client address, query band, etc.) > Data size
38
TASM Utility Session Configuration Rules • Session configuration rules are optional. • These rules are active when any category of TDWM is enabled. • In each session configuration rule, the DBA specifies the criteria and the number of sessions to be used when these criteria are met. • For example, for stand alone MultiLoad jobs submitted by user Charucki, use 10 sessions. • Session configuration rules also support the Archive/Restore utility. • The DBA can define similar rules to specify the number of HUTPARSE sessions to be used for a specific set of criteria.
39
• A new internal DBSControl field: DisableTDWMSessionRules is provided to disable user-defined session configuration rules and default sessions rules while TDWM is enabled. • When this field is set, Client and DBS will operate as in Teradata 13.
Availability, Serviceability, DBA Tasks Improvements
Fault Isolation • Description > Remove cases where faults can cause restarts > Specific cases – EVL fault isolation – Unprotected UDFs – Dictionary cache re-initialization
• Benefits > Identify and isolate the fault to only the query or session > Issues in query calculation and qualification will be isolated > Badly behaving UDFs will have less opportunity to affect the system 41
> Faults in the dictionary cache will result in the dictionary cache being flushed and reloaded rather than affecting the entire system
AMP Fault Isolation •
Description > This feature is intended to catch those AMP errors that currently cause DBS restarts where the error can be dealt with by taking a snapshot dump and aborting the transaction that caused the error Benefit > This feature can reduce the number of DBS restarts for customers, thus improving overall system availability •
•
What will it do? 43
> Current AMP Fault Isolation only avoids a full database restart for errors when accessing spool tables > The scope of fault isolation will be increased to cover ERRAMP* or ERRFIL* errors on permanent tables as well spools
Read From Fallback • Description > In the event of encountering a data block read error, either unreadable or corrupt data blocks, this feature will leverage the pre-existing Fallback Table facility to transparently retrieve the required data block from the fallback copy. • Benefit > When fallback is available, feature seriously improves fault tolerance and system availability. Significantly improves the value of having fallback and protects non-redundant (RAID 0 or JBOD) storage media, such as SSD, from data loss without restart/failover. 44
• Considerations > Fallback does not need to be instantiated as system-wide property, because fallback is a table-level attribute, it can be applied selectively to the largest/most critical customer tables. > This facility does not in-and-of itself repair bad data blocks, but allows them to be read from fallback until they can be repaired.
Read From Fallback - Particulars • Reading data blocks from the Fallback copy is transparent to both a user and/or application. Manual intervention is not required whatsoever. • Feature does not require any special or particular locking mechanism. • A manual process is still required to rebuild the table to repair unreadable or corrupt data blocks. • Facility cannot recover from data block errors in the Cylinder Index, NUSI Secondary Index or Permanent Journals. 45
• Read errors are fallback recoverable on TD Data Dictionary tables with the exception of the unhashed system tables such as the WAL log, Transient Journal and Space Accounting tables. • Facility applies to SQL Queries with data block read errors, SQL Insert…Select statements and the Archive utility where the block read error is on the source table only.
Transparent Cylinder Packing • Description > Develop a new file system background task that will pro-actively and transparently monitor and adjust the utilization (high or low) of user data cylinders and pack/unpack said cylinders accordingly with the goal of returning them to a more efficiently utilized state. • Benefit 1. Cylinder Packing will result in cylinders having a higher datablock to cylinder index ratio making Cylinder Read operations more effective by reading less unoccupied sectors. 2. Higher cylinder utilization translates into data tables occupying less cylinders leaving more cylinders available for other purposes. 46
3. Diminishes the chances that a “mini-cylpack” operation will be executed and lessens the need for administrators to perform regularly scheduled Packdisk operations. • Considerations > This feature will have several customer tunable parameters in DBSControl that will allow customers to mange and adjust the level of impact of the Transparent Cylinder Packing operations.
Merge Data Blocks During Full Table Modify Operations • Description > During full table modification operations such as Multiload, Insert Select and Update or Delete Where, combine adjacent blocks when small blocks are present.
47
• Benefit > Small data blocks increase the I/Os necessary to read a table and interferes with features such as compression and large cylinders. > Reduce the instances of small data blocks by combining them when doing work on those blocks or adjacent ones.
48
Archive DBQL Rule Table • Description > Enhance the Teradata Archive utility to include two additional DBC tables to the DBC database (Dictionary) backup/restore: – DBC.DBQLRuleTbl – DBC.DBQLRuleCountTbl
• Benefit > Inclusion of the additional DBC tables in the DBC Archive/Restore process will provide a mechanism by which these tables can be archived/restored and will altogether eliminate the cumbersome task of having to every time redefine the appropriate DBQL rules after a Dictionary initialization. > Implementation of this feature avoids the possibility of any table synchronicity issues and offers simplicity, convenience, and integrity when conducting a DBC archive/restore. 37
• Considerations > DBC Archive will include these tables automatically in the Dictionary Archive; no user intervention is required.
Be Aware Especially if Considering Tech Refresh
Large Cylinder Support • Description > This feature increases data storage cylinder size, the basic allocation unit for disk space in the Teradata file system. This also includes an increase in the Cylinder Index size thus allowing for a commensurate increase in storing more data blocks per cylinder. • Benefit > Eliminates the inefficiency associated with managing a large number of small cylinders on very large disk drives, allows larger AMP sizes (~10 TB per AMP), permits the more efficient storage of Large Objects and provides the foundation for block level compression by allowing more small blocks on a cylinder. • Consideration > This capability is only available starting in Teradata 13.10 and going forward and requires a System Initialization (SysInit) to be performed so that large cylinder support can be engaged. It is 39
anticipated that typically this activity would be performed during technology refresh opportunities.
Packed Row format for 64-bit platforms •
Description > With the introduction of Teradata 13.10, data will now be stored on the database in bytepacked format whereas previously the data had been stored in byte-aligned format. •
Benefits 40
S ystem I ni tiali z ati on ( S ysIn i t) to
•
> Translates directly into a 4-7 % disk space savings insofar as less disk space is required to store byte-packed data than is required with byte-aligned data. Additionally, enables data rows to be accessed using fewer I/Os thus potentially enhancing the performance of some workloads.
Considerations > This capability is only available starting in Teradata 13.10 and going forward and requires a
Enhanced Teradata Hashing Algorithm • Description > Enhance the Teradata Hashing Algorithm to reduce the effects of irregularities in character data on hash results. • Benefit > This enhancement is targeted to reduce the number of hash collisions for character data stored as either Latin or Unicode,
3 /1 8 /1 0 Teradata Database 13.10 notably strings that contain primarily numeric data. Reduction in hash collisions reduces access time per AMP and produces a more balanced row distribution which in-turn improves parallelism. Reduced access time and increased parallelism translate directly to better performance. • Considerations > This capability is only available starting in Teradata 13.10 and going forward and requires a System Initialization (SysInit) to be performed so that the enhanced hashing algorithm can be engaged. It is anticipated that typically this activity would be performed during technology refresh opportunities 41.
Q u a lity /
S u pp or t - ab ility
• AMP fault isolation
• Dictionary cache re-initialization
• Parser diagnostic information capture
• EVL fault isolation and unprotected UDFs
Pe rfo rma n ce
• FastExport without spooling
• Merge data blocks during full table modify operatio
• Character-based PPI
• Statement independence
• Timestamp partition elimination
• TVS Initial suggested temperature tables
• User Defined Ordered Analytics
Ac tive En abl e
• Restart time reduction
• TASM: Utilities Management
• Read from Fallback
• TASM: Additional Workload Definitions
• TASM: Workload Designer E as e of Us e 42 >
En t e rpr is e F it
• Teradata 13.10 Teradata Express Edition
• Moving current date in PPI
• Domain Specific System Functions
• Automatic cylinder packing
• • • • • • • • • •
• • • • • • • •
Algorithmic Compression for Character Data VLC for VARCHAR columns Block level compression Variable fetch size (JDBC) User Defined SQL Operators Temporal Processing Temporal table support Period data type enhancements Replication support Time series Expansion support
Archive DBQL rule table Enhanced trusted session security External Directory support enhancements Geospatial enhancements Statement Info Parcel Enhancements (JDBC) Support for IPv6 Support unaligned row format for 64-bit platforms Enhanced hashing algorithm
• Large cylinder support
TIB
Academy
Second Floor and Third Floor,
5/3 BEML Layout,
Varathur Road, Thubarahalli, Kundalahalli Gate, Bangalore 66
Landmark – Behind Kundalahalli Gate bus stop,
Opposite to SKR Convention Mall,
Next to AXIS Bank.