Intel® SDK for OpenCL* Applications 2013 User’s Guide
Document Number: 326791-003US Copyright © 2010–2013, Intel Corporation. All Rights Reserved
Contents Legal Information ............................................................................................................ 4 Getting Help and Support ................................................................................................. 5 Introduction ................................................................................................................... 5 Introducing Intel® SDK for OpenCL* Applications 2013 .............................................. 5 Related Documents................................................................................................. 5 What's New? .......................................................................................................... 6 Related Products .................................................................................................... 6 Overview of OpenCL* Support on Intel® Core™ Processors .................................................. 6 Components and Packages ...................................................................................... 6 The OpenCL* Platform on Intel® Architecture Processors ............................................ 8 Supported OpenCL* Features .......................................................................................... 10 OpenCL* 1.2 Full Profile ......................................................................................... 10 OpenCL* Installable Client Driver (ICD) ................................................................... 10 Shared Context ..................................................................................................... 10 OpenCL* Extensions and Optional Features .............................................................. 11 Utilizing CPU Device Resources ........................................................................................ 18 Seamless Vectorization Using the Implicit CPU Vectorization Module ............................ 18 Threading System ................................................................................................. 18 Developing with Intel® SDK for OpenCL* Applications 2013 ................................................ 18 Developing with OpenCL* Platform .......................................................................... 18 Configuring Microsoft Visual Studio* ........................................................................ 19 Working with the OpenCL* Installable Client Driver (ICD) ........................................... 20 Programming with the Intel® SDK for OpenCL* Applications 2013 ....................................... 21 Intel® SDK for OpenCL* Applications 2013 Development Tools ................................... 21 Building and Analyzing OpenCL* Kernel Performance ................................................. 22 Using the Intel® SDK for OpenCL* - Debugger ......................................................... 45 Tuning with the Intel® Graphics Performance Analyzers............................................. 47 Interoperability with the Intel® VTune™ Amplifier XE ................................................ 56 Appendix A - Supported Images Formats .......................................................................... 60 Read-Only Surface Formats .................................................................................... 60 Write-Only Surface Formats .................................................................................... 62 Read-Write Surface Formats ................................................................................... 64 Appendix B – OpenCL* Build and Linking Options ............................................................... 66 Preprocessor Options ............................................................................................. 66 Math Intrinsics Options .......................................................................................... 66 Optimization Options ............................................................................................. 66 Options for Warnings ............................................................................................. 67 Options Controlling the OpenCL* C Version ............................................................... 67
2
Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
3
Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number/. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel, Intel logo, Intel Core, VTune, Xeon are trademarks of Intel Corporation in the U.S. and other countries. * Other names and brands may be claimed as the property of others. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission from Khronos. Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. Copyright Š 2010-2013 Intel Corporation. All rights reserved.
4
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Getting Help and Support You can get support with issues you face using the Intel® SDK for OpenCL* Applications support forum at intel.com/software/opencl. For information on SDK requirements, known issues and limitations, refer to the Intel® SDK for OpenCL* Applications 2013 - Release Notes at http://software.intel.com/en-us/articles/intel-sdk-foropencl-applications-2013-release-notes/.
Introduction Introducing Intel® SDK for OpenCL* Applications 2013 This User’s Guide contains general information about the Intel® SDK for OpenCL* Applications 2013, its features, tools, package contents, and basic usage guidelines. OpenCL (Open Computing Language) is an open standard for general-purpose parallel programming for diverse mix of multi-core CPUs, Graphics Processers, and other parallel processors.. OpenCL provides flexible execution model and uniform programming environment for software developers to write portable code for client systems running on both the CPU and graphics processors like Intel® HD Graphics. OpenCL is developed by multiple companies through the Khronos* OpenCL committee. Intel, a founding member of the OpenCL Working Group, has been a leading voice to ensure the OpenCL feature set supports OpenCL programmers on current and future Intel® Architecture.
About Intel® SDK for OpenCL* Applications 2013 Intel® SDK for OpenCL* Applications 2013 is a comprehensive software development environment for OpenCL applications on the 3rd and the 4th Generation Intel® Core™ processors, which support OpenCL 1.2 on Windows 7* and Windows 8* operating systems. For information on Intel® SDK for OpenCL* Applications 2013 limitations and known issues, please see the Intel® SDK for OpenCL* Applications 2013 Release Notes.
See Also Intel SDK for OpenCL Applications - Release Notes Intel HD Graphics Developers Web Site
Related Documents For information on SDK limitations, optimization guidelines, and samples, refer to the following links: 1. 2. 3. 4.
Intel® SDK for OpenCL* Applications - Release Notes Intel SDK for OpenCL Applications - Optimization Guide OpenCL 1.2 Specification at http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf Intel SDK for OpenCL Applications - Samples Getting Started
Intel® SDK for OpenCL* Applications 2013 documentation, samples, and technical articles are available at intel.com/software/opencl/.
5
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
What's New? The new Intel® SDK for OpenCL* Applications 2013 brings improvements and new features both in the OpenCL runtime and the development tools. The new Intel SDK for OpenCL Applications 2013 now works seamlessly with the latest edition of Intel® HD Graphics driver that includes certified OpenCL 1.2 support on the 3rd and the 4th Generation Intel® Core™ processors with Intel HD Graphics on Microsoft Windows* 7 and Windows 8* operating systems. OpenCL 1.2 support promises better performance and greater stability on both the CPU and the Intel HD Graphics. OpenCL 1.2 provides more flexibility in software design for OpenCL programmers with improved compilation, linking, and library support, and improved graphics and media surface sharing for full media and graphics acceleration on the Intel HD Graphics. This version of the SDK comprises developer tools, and provides a comprehensive environment for the build, debug, and tune stages of OpenCL application development. For performance analysis, this environment enables the leading Intel profiling applications, Intel® VTune™ Amplifier XE and Intel® Graphics Performance Analyzers (GPA). New tools available with the Intel SDK for OpenCL Applications 2013 ease build and optimization of OpenCL applications running on Intel HD Graphics. For more information on what is new, please see the SDK release notes.
See Also Intel VTune Amplifier XE website Intel Graphics Performance Analyzers website Using the Intel SDK for OpenCL Applications - Kernel Builder Working with the Intel SDK for OpenCL Applications - Offline Compiler - Command-Line Version
Related Products The following is the list of products related to Intel® SDK for OpenCL* Applications.
•
Intel® Graphics Performance Analyzers (Intel® GPA)
•
Intel® VTune™ Amplifier XE
•
Intel® Media SDK
•
Intel® Perceptual Computing SDK.
Overview of OpenCL* Support on Intel® Core™ Processors Components and Packages For full OpenCL* support on 3rd and 4th Generation Intel® Core™ Processors, the following components are available.
Intel® SDK for OpenCL* Applications 2013 Package with OpenCL* 1.2
6
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
The SDK package is a comprehensive software developer kit for OpenCL* applications on Intel® Architecture processors. This package includes Intel OpenCL headers, libraries, developer tools like kernel builder and kernel debugger, and integration with profiling tools. This SDK package also includes a CPU only runtime package with fully conformant support for OpenCL 1.2 features. The SDK installer also includes the standalone CPU only OpenCL 1.2 runtime.
Intel SDK for OpenCL - CPU Runtime Only Package with OpenCL 1.2 This standalone CPU only OpenCL 1.2 runtime can be installed on all generations of Intel Core Processors. It is also available for applications to ensure backward compatibility and coverage for Intel Core Processors without Intel® HD Graphics. This CPU runtime includes redistribution rights.
OpenCL 1.2 Support on CPU and Intel® HD Graphics The latest version of the Intel® HD Graphics driver includes the OpenCL* runtime and compiler for both the Intel® Processor (CPU) and the Intel HD Graphics (GPU). This driver is available with 3rd and 4th generation Intel Core Processors with the Intel HD Graphics. To develop OpenCL applications with the SDK, you need to download the latest edition of the driver. OpenCL applications developed with the Intel SDK for OpenCL Applications 2013 can run seamlessly on a wide range of the 3rd and the 4th Generation Intel Core Processors with Intel HD Graphics and preinstalled Intel HD Graphics driver.
Code Samples Free downloaded code samples and optimization best known methods for OpenCL* on Intel® Architecture are available online thought SDK page at intel.com/software/opencl/.
Intel® SDK for OpenCL* Applications Website The SDK web site at intel.com/software/opencl provides links to videos, technical articles, case studies, and support forums and is an integral component of this SDK.
The Intel SDK for OpenCL Applications Software Stack Use the Intel SDK for OpenCL Applications 2013 to develop applications and then seamlessly deploy and run them with the Intel HD Graphics driver.
7
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
See Also Intel SDK for OpenCL Applications - Samples Getting Started Intel® Driver Update Utility website
The OpenCL* Platform on Intel® Architecture Processors Intel provides a single OpenCL* platform to seamlessly access the compute power across both Intel® Processors (CPUs) and Intel® HD Graphics in a standard manner. The Platform model for OpenCL on the 3rd and the 4th Generation Intel® Core™ processor family is defined in the following figure.
8
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
The OpenCL platform consists of two OpenCL devices: 1. Intel® Processor – a CPU device 2. Intel® HD Graphics – a GPU device.
Platform Info ID
Value
CL_PLATFORM_VENDOR Intel® Corporation CL_PLATFORM_NAME
Intel® OpenCL
NOTE The Intel platform on Intel® Xeon® processors or on previous generations of Intel Core processors includes only the CPU device. The following is the device type ID for Intel hardware with OpenCL 1.2 full profile: Device Type ID
Hardware Device
CL_DEVICE_TYPE_GPU Intel HD Graphics CL_DEVICE_TYPE_CPU Intel Core Processor
9
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
See Also OpenCL 1.2 Full Profile
Supported OpenCL* Features OpenCL* 1.2 Full Profile Intel® SDK for OpenCL* Applications 2013 provides support for OpenCL 1.2 features on both Intel® Architecture CPU and Intel® HD Graphics devices. Both devices are complaint with the OpenCL 1.2 specification.
See Also OpenCL* 1.2 Specification at http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
OpenCL* Installable Client Driver (ICD) The OpenCL* Installable Client Driver (ICD) enables different OpenCL implementations to coexist on the same system. ICD also enables applications to select between OpenCL implementations at run time. Intel® SDK for OpenCL Applications 2013 supports the OpenCL 1.2 ICD.
See Also Shared Context for CPU and Intel® HD Graphics Working with the OpenCL Installable Client Driver (ICD)
Shared Context Shared Context for CPU and Intel® HD Graphics The Intel implementation of the OpenCL* standard supports context for multiple devices, also called “shared context”. An OpenCL context of CPU and Intel® HD Graphics (GPU) devices, enables memory sharing and events to share different devices facilities. This feature eases development of workloads that run across the platform (CPU and GPU devices). One way to create a shared context: shared_context = clCreateContextFromType(prop, CL_DEVICE_TYPE_ALL, …); Do not specify CL_DEVICE_TYPE_ALL if the application targets a single device context (either CPU or GPU). Specify CL_DEVICE_TYPE_CPU or CL_DEVICE_TYPE_GPU explicitly in this case, hinting at the context usage target device. Another way to create shared context is to provide the list of devices explicitly:
10
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
cl_device_id devices[2] = {cpuDeviceId , gpuDeviceId}; cl_context shared_context = clCreateContext(prop, 2, devices, …); For more information on the functionality of a shared context, self-management and application level management, see the OpenCL 1.2 specification. You do not need to worry about memory object mirroring/migration between different context devices, just avoid concurrent “Write” access to the same memory object by the different devices, as stated in the OpenCL 1.2 Specification. The following extensions are not supported by shared context:
•
cl_khr_gl_sharing
•
cl_khr_d3d10_sharing
•
cl_khr_d3d11_sharing
•
cl_intel_dx9_media_sharing.
See Also OpenCL 1.2 Specification at http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
Resource Sharing OpenCL* Memory Objects, created on a shared context, are shared among the context devices. Applications do not need to copy them (using a "Read" or "Write" operation) between the context devices to process on the target device. Intel OpenCL devices use true resource sharing in a shared context. There is no hidden memory copy when processing the memory object on different devices. However, there might still be a copy on clEnqueueMapBuffer[Image] or clEnqueueUnmapMemObject of memory objects that were created with the CL_MEM_USE_HOST_PTR flag set. The Intel® SDK for OpenCL Applications - Optimization Guide provides more information on how to avoid memory synchronization overhead with the host application.
See Also Supported OpenCL* Extensions and Optional Features Intel SDK for OpenCL Applications - Optimization Guide
OpenCL* Extensions and Optional Features Supported OpenCL* Extensions and Optional Features For information on OpenCL* 1.2 features support on Intel® Architecture CPU and Intel® HD Graphics devices see the following table: Extension
Intel HD Graphics CPU Device (GPU) Device
cl_khr_int64_base_atomics
No
No
11
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
cl_khr_int64_extended_atomics No
No
cl_khr_3d_image_writes
Yes
No
cl_khr_fp16
No
No
cl_khr_fp64
No
Yes
cl_khr_gl_sharing
Yes
Yes
cl_khr_gl_event
Yes
No
cl_khr_depth_images
Yes
No
cl_khr_gl_depth_images
Yes
No
cl_khr_gl_msaa_sharing
Yes
No
cl_khr_d3d10_sharing
Yes
No
cl_khr_d3d11_sharing
Yes
Yes
cl_khr_dx9_media_sharing
Yes
Yes
The following extensions were merged into the OpenCL 1.2 specification and are still supported for software compatibility:
• • •
cl_ext_device_fission extension supports both OpenCL 1.1 EXT APIs and OpenCL 1.2 fission core features. cl_intel_dx9_media_sharing extension supports Intel media sharing extension and also the OpenCL 1.2 KHR media sharing extension (cl_khr_dx9_media sharing). cl_intel_printf extension, aligned with OpenCL 1.2 printf function.
Image Support Both CPU and Intel® HD Graphics devices support Image Objects (CL_DEVICE_IMAGE_SUPPORT property of device information). For full list of supported images formats – see chapter "Appendix A - Supported Images Formats" of this guide.
Supported Devices Intel CPU
Yes
Intel HD Graphics (GPU) Yes
See Also Appendix A – Supported Images Formats: Read-Only Surface Formats.
Writing to the 3D Image Memory Object
12
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Supported Devices Intel® CPU
No
Intel® HD Graphics (GPU) Yes The Intel HD Graphics device implements the cl_khr_3d_image_writes extension to support writes to a 3D image memory object.
DirectX 9* Media Sharing Extension
Supported Devices Intel® CPU
Yes
Intel® HD Graphics (GPU) Yes Both the CPU and the Intel HD Graphics (GPU) devices support the following OpenCL* 1.2 vendor extensions:
• • •
cl_khr_dx9_media_sharing extension, which provides interoperability between OpenCL and selecgted adapter APIs cl_intel_dx9_media_sharing extension, which provides interoperability between OpenCL and Microsoft DirectX* 9 API, specifically DirectX 9 media surfaces. clEnqueueReleaseDX9ObjectsINTEL and clEnqueueAcquireDX9ObjectsINTEL.
The CPU and GPU devices react to the clEnqueueReleaseDX9ObjectsINTEL and clEnqueueAcquireDX9ObjectsINTEL functions differently:
• • •
The command is synchronous on the Intel HD Graphics and asynchronous on the CPU On the Intel HD Graphics device you can safely call LockRect on the Microsoft Direct 3D* API surface object after this function returns CPU device queues the command like any other enqueue API function. You should wait for the event argument to make sure the command has executed.
For specifications of Khronos*-approved and vendor-approved OpenCL extensions please visit the Khronos* OpenCL* API Registry at http://www.khronos.org/registry/cl/.
See Also Khronos* OpenCL* API Registry at http://www.khronos.org/registry/cl/
DirectX 10* Sharing
Supported Devices Intel® CPU
No
Intel® HD Graphics (GPU) Yes Enables sharing of OpenCL* and DirectX* 10 API resources with cl_khr_d3d10_sharing.
13
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
DirectX 11* Sharing
Supported Devices Intel® CPU
Yes
Intel® HD Graphics (GPU) Yes Enables sharing of the OpenCL* 1.2 and DirectX 11* API resources.
OpenGL* Sharing
Supported Devices Intel® CPU
Yes
Intel® HD Graphics (GPU) Yes Enables sharing of the OpenGL* and the OpenCL* resources. The following extensions are supported:
• •
cl_khr_gl_sharing - creating OpenCL context from an OpenGL context or share group on Microsoft Windows* operating systems only. cl_khr_gl_event - sharing memory objects with OpenGL or OpenGL ES buffers, texture and render bugger objects on Microsoft Windows OS only. Event support available only on the Intel HD Graphics device.
Sharing of OpenCL* and OpenGL* MSAA Textures
Supported Devices Intel® CPU
No
Intel® HD Graphics (GPU) Yes This feature extends the OpenGL* (which is the cl_khr_gl_sharing extension) to enable an OpenCL* image to be created from an OpenGL multi-sampled (known as MSAA) texture (color or depth). The following extensions are supported:
• • •
cl_khr_gl_msaa_sharing – MSAA support cl_khr_depth_images – adds support for depth images in OpenCL image. cl_khr_gl_depth_images – supports OpenCL image to be created from an OpenGL depth or depth-stencil texture.
Double Precision Floating Point
Supported Devices Intel® CPU
14
Yes
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Intel® HD Graphics (GPU) No The CPU device implements cl_khr_fp64 to support for double precision floating-point. Double precision floating-point is a requirement for some scientific computing algorithms/applications.
Device Fission
Supported Devices Intel® CPU
Yes
Intel® HD Graphics (GPU) No Device Fission is a new OpenCL* 1.2 core feature. Device Fission extension enables you to control utilization of compute units within a compute device. The OpenCL CPU device supports both the OpenCL 1.2 core features and thcl_ext_device_fission extension. The Intel® SDK for OpenCL Applications 2013 device supports the following fission modes:
• • •
CL_DEVICE_PARTITION_EQUALLY_EXT enables you to create equally-sized sub-devices, each containing the provided number of compute units. Any remaining compute units of the parent device are not used. CL_DEVICE_PARTITION_BY_COUNTS_EXT enables you to specify a list of sizes. A sub device of the appropriate size is created for every item on the list. The sum of sizes provided for sub devices must not be more than the size available to the parent device. CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT enables you to split a device according to an architectural feature of the parent device. Intel SDK for OpenCL Applications 2013 supports only the CL_AFFINITY_DOMAIN_NUMA_EXT property, which creates a sub-device for every Non-Uniform Memory Access (NUMA) node on the platform.
Device Fission enables improving the performance of the applications which use OpenCL. To gain the most from this feature, follow these guidelines:
•
•
•
There is an overhead that depends on the amount of created sub devices. For better results, follow these guidelines: o Use clCreateSubdevicesEXT to create exactly as many sub-devices as you are going to use in your program. o To create a sub-device with half the compute units of a parent, use BY_COUNTS with a single item to create only one sub-device instead of using EQUALLY that creates two. o The fewer sub-devices you create, the more performance you gain. Avoid simultaneously executing commands in the following configurations: o on a sub-device and its ancestor o on sub-devices that could overlap compute units o on sub-devices from different partitioning modes o on sub-devices from different calls to clCreateSubDevicesEXT that must overlap if executed simultaneously. For example, sub-devices created BY_COUNTS totaling more than the amount of available compute units on the parent device must overlap. Use the CL_AFFINITY_DOMAIN_NUMA_EXT property to achieve the best performance, by ensuring allocation of the physical pages on the correct node, and creating memory objects with the USE_HOST_PTR property.
15
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Intel Immediate Command Execution Extension Supported Devices Intel® CPU
Yes
Intel® HD Graphics (GPU) No The CPU device supports the cl_intel_exec_by_local_thread extension. Immediate Command Execution is the extension that enables you to execute OpenCL* commands in a single-threaded manner, using the calling thread to perform the actual execution. To utilize this extension, add the token CL_QUEUE_THREAD_LOCAL_EXEC_ENABLE_INTEL to the queue properties when executing clCreateCommandQueue. clEnqueueXXX calls to that queue are synchronous – they return only after the queued command finishes executing. Moreover, only the thread calling the clEnqueueXXX call executes those commands, this includes calls to clEnqueueNDRange. Using this extension, you can create a command queue alongside the rest of the queues, and use it to execute lightweight kernels or NDRanges with a high granularity (small global size), that cannot gain much from the Intel multi-core architecture. You will still get the full benefits of the compiler, including the automatic vectorization module. An Immediate Command Execution queue can be in-order or out-of-order. In the in-order mode, if multiple threads are added to the same queue at the same time, they block each other to comply with the OpenCL* in-order queue semantics. Therefore, you should use the combination of CL_QUEUE_THREAD_LOCAL_EXEC_ENABLE_INTEL and CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE.
NOTE The extension tokens are defined in the cl_ext.h file, which is located under: INTELOCLSDKROOT\include\CL.
See Also Out-Of-Order Execution
Out-of-order Execution Supported Devices Intel® CPU
Yes
Intel® HD Graphics (GPU) No The CPU device supports an Out-Of-Order execution model for kernels and memory objects in the device command queue (CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a commandqueue).
Native Kernels
Supported Devices Intel® CPU
16
Yes
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Intel® HD Graphics (GPU) No The CPU device supports the execution of native kernels (CL_EXEC_NATIVE_KERNEL option of the CL_DEVICE_EXECUTION_CAPABILITIES property of device information). Native kernels are accessed through a host function pointer. Native kernels are queued for execution along with OpenCL* kernels on a device and share memory objects with OpenCL* kernels. For example, these native kernels could be functions defined in application code or exported from a library.
Interoperability with Media and Graphics APIs Intel® SDK for OpenCL* Applications 2013 provides interoperability with other graphics and media APIs such as Microsoft DirectX*, OpenGL*, and the Intel® Media SDK. Graphics and Media interoperability enables the applications that use these APIs to benefit from true surface sharing and zero copy on the Intel® HD Graphics OpenCL device, when using according to condition in the table below: Extension
Condition
DirectX 9 Media SurfaceSharing
Provide a none NULL pSharedHandle to clCreateFromDX9MediaSurfaceKHR to implement true sharing
DirectX 10 and 11 Sharing Memory Objects
Create a resource with D3D10_RESOURCE_MISC_SHARED and D3D11_RESOURCE_MISC_SHARED flag specified to implement true sharing
OpenGL Sharing Memory Objects
Depends on your hardware state. No specific way for the application to implement true sharing.
For information on pSharedHandle, see the “Feature Summary (Direct3D 9 for Windows Vista)” article. Use the DirectX* 9 API also for Intel® Media SDK interoperability. For more details on how to take advantage of the interoperability between OpenCL* and Intel Media SDK, see the Intel® Media SDK Interoperability code sample. For more information on D3D10_RESOURCE_MISC_SHARED and D3D11_RESOURCE_MISC_SHARED, see “Feature Summary (Direct3D 9 for Windows Vista)” article.
See Also Seamless Vectorization Using the Implicit CPU Vectorization Module Feature Summary (Direct3D 9 for Windows Vista) at http://msdn.microsoft.com/enus/library/windows/desktop/bb219800(v=vs.85).aspx Intel SDK for OpenCL Applications - Samples Getting Started Intel Media SDK website Interoperability with Intel Media SDK sample
17
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Utilizing CPU Device Resources Seamless Vectorization Using the Implicit CPU Vectorization Module The CPU OpenCL* device available with the the Intel® SDK for OpenCL Applications 2013 provides a smart OpenCL compilation environment, which efficiently maps the code to the processor vector units and includes a unique implicit CPU vectorization module for best utilization of the hardware vector units across work items. The vectorization module aims to merge together the execution of several work items, utilizing the Intel vector instruction set. This vectorization module seamlessly extends the utilization of the vector unit when moving from one generation to another. For example, from Intel® Streaming SIMD Extensions (Intel® SSE) to Intel® Advanced Vector Extensions (Intel® AVX). The compilation environment is based on the open source LLVM compiler and its clang front end.
See Also Efficient, Highly Scalable Threading System Auto Vectorization of OpenCL code with the Intel SDK for OpenCL Applications website
Threading System The CPU OpenCL* device available with the Intel® SDK for OpenCL Applications 2013 contains an efficient, highly scalable threading system for optimal multi-core performance. The system is based on the Intel® Threading Building Blocks (Intel® TBB). This runtime feature enables the OpenCL applications to seamlessly utilize the multicore CPU.
See Also Intel Threading Building Blocks website
Developing with Intel® SDK for OpenCL* Applications 2013 Developing with OpenCL* Platform Intel® SDK for OpenCL* Applications 2013 related binaries are installed to the following directory: 1. $(INTELOCLSDKROOT)\bin\x86 for 32-bit applications 2. $(INTELOCLSDKROOT)\bin\x64 for 64-bit applications where the environment variable INTELOCLSDKROOT is automatically added to the system during installation and points to SDK installation directory. This directory is automatically added to the system PATH environment. For more information, please see the Intel SDK for OpenCL Applications 2013 - Release Notes.
18
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
To work with the OpenCL runtime, an application should link to the OpenCL Installable Client Driver (ICD) import library. The library is installed to: 1. $(INTELOCLSDKROOT)\lib\x86 for 32-bit applications 2. $(INTELOCLSDKROOT)\lib\x64 for 64-bit applications. For more information on how to use OpenCL* ICD, see the "Working with the OpenCL Installable Client Driver (ICD)" section.
See Also Intel SDK for OpenCL Applications - Release Notes Configuring Microsoft* Visual Studio* Working with the OpenCL* Installable Client Driver (ICD)
Configuring Microsoft Visual Studio* The Intel速 SDK for OpenCL* Applications provides development environment with the Microsoft Visual Studio* versions 2008, 2010 and 2012. To setup a Microsoft Visual Studio project to work with Intel SDK for OpenCL Applications 2013 (applies to all supported versions of the Microsoft Visual Studio): 1. Open the project property pages by selecting Project > Properties. 2. In the C/C++ > General property page, under Additional Include Directories, enter the full path to the directory where the OpenCL header files are located: $(INTELOCLSDKROOT)\include.
3. In the Linker > General property page, under Additional Library Directories, enter the full path to the directory where the OpenCL run-time import library file is located. For example, for 32-bit application: $(INTELOCLSDKROOT)\lib\x86
19
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
In the Linker > Input property page, under Additional Dependencies, enter the name of the OpenCL ICD import library file OpenCL.lib.
Working with the OpenCL* Installable Client Driver (ICD) When using the OpenCL* Installable Client Driver (ICD), the application is responsible for selecting the OpenCL platform to use. If there are several OpenCL platforms installed on the system, the application should use the clGetPlatformIDs and clGetPlatformInfo functions to query the available OpenCL platforms and decide which one to use. The following example shows how to query the system platforms to get the correct platform ID: cl_platform_id * platforms = NULL; char vendor_name[128] = {0}; cl_uint num_platforms = 0; // get number of available platforms cl_int err = clGetPlatformIDs(0, NULL, & num_platforms); if (CL_SUCCESS != err) { // handle error } platforms = (cl_platform_id*)malloc( sizeof(cl_platform)* num_platforms); if (NULL == platforms) { // handle error } err = clGetPlatformIDs(num_platforms, platforms, NULL); if (CL_SUCCESS != err) {
20
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
// handle error } for (cl_uint ui=0; ui< num_platforms; ++ui) { err = clGetPlatformInfo(platforms[ui], CL_PLATFORM_VENDOR, 128 * sizeof(char), vendor_name, NULL); if (CL_SUCCESS != err) { // handle error } if (vendor_name != NULL) { if (!strcmp(vendor_name, "Intel® Corporation")) { return platforms[ui]; } } } // handle error
Programming with the Intel® SDK for OpenCL* Applications 2013 Intel® SDK for OpenCL* Applications 2013 Development Tools Intel® SDK for OpenCL* Applications 2013 provides a comprehensive environment for the build, debug, and tune stages of your OpenCL application development. In addition, the SDK takes advantage of the following Intel profiling tools: 1. Intel® VTune™ Amplifier XE
21
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
2. Intel® Graphics Performance Analyzers (Intel® GPA). The table below shows the availability of tools and OpenCL devices on Windows* OS. Device Available
Tool CPU
GPU
Kernel Builder (Standalone)
Yes
Yes
Offline Compiler Command-Line Interface (standalone)
Yes
Yes
Offline Compiler (Microsoft Visual Studio* plug-in)
Yes
Yes
Step by Step Kernel Debugger
Yes
No
Intel VTune Amplifier XE
Yes
Preview Feature
Intel GPA System Analyzer
Yes
Yes
Intel GPA Platform Analyzer – OpenCL Tuning
Yes
Yes
Building and Analyzing OpenCL* Kernel Performance Compiling Your Program Using the Intel® SDK for OpenCL* - Kernel Builder Intel® SDK for OpenCL* Applications 2013 enables you to build and analyze your OpenCL* kernels. The tool provides full offline OpenCL language compilation, which includes:
• • • • •
OpenCL* syntax checker cross-hardware platform compilation Low Level Virtual Machine (LLVM) viewer Assembly code viewer Intermediate program binary generator.
The Intel® SDK for OpenCL* - Kernel Builder standalone application enables you to:
• •
assign input to the kernel and test its correctness analyze kernel performance based on: • group sizes • build options • device.
The Kernel Builder enables you to build an application that calls the kernel, and to test it with specific input.
22
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
See Also Building OpenCL Kernels
Building OpenCL* Kernels 1. Use the OpenCL* Code text box to write your OpenCL code, or load the code from a file. You can load your code from file by
o o o o
Clicking the Open button in the toolbar Selecting File > Open Pressing Ctrl + O Dragging the file into the application window.
button in the commands toolbar, or select Build > Build File from the main 2. Click the Build menu to build the OpenCL code. 3. Look at the Console text box below to see information on the build status. If the build succeeds, the text box background color turns green, otherwise, it turns red.
4. Upon failure use the Line and Column coordinates presented at the bottom right corner to navigate to the problematic line or simply double click the error line in the Console text box to jump to the relevant line automatically.
23
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Configuring the Build Options To configure the build options for the OpenCL* code, do the following: 1. Select Build > Build/Compile Options... 2. In the Options window enter the build options in the Options text box: o Click the "..." button next to the Options text box to see the full list of options supported by the OpenCL standard. o Select the rows with build options you want to add from the Options table and click Add. The options appear in the Build Options text box. 3. Click OK to return to the main form.
See Also Choosing a Target Device
Choosing a Target Device The Kernel Builder tool enables you to choose the target device (Intel® Architecture CPU or Intel® HD Graphics) when building an OpenCL* code. The default device is CPU. 1. Select Build > Build/Compile Options... 2. In the Options window, under the Target OpenCL Device group box select one of the available devices: o Intel® CPU o Intel® HD Graphics (GPU). 3. Click OK to return to the main window.
24
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
See Also Viewing the Generated LLVM Code
Viewing the Generated LLVM Code To see the generated LLVM (Low Level Virtual Machine) IR from the OpenCL* code, follow these steps:
1. When the build successfully completes, click the Show LLVM toolbar or select View > LLVM View.
button on the application
The LLVM code appears in the LLVM Code text box, to the right of the OpenCL Code text box.
2. Click the Show LLVM
button again to hide the LLVM Code text box.
See Also Viewing the Generated Assembly Code
Viewing the Generated Assembly Code The Kernel Builder tool enables you to see the generated assembly code of the OpenCL* code.
NOTE This feature is available for the CPU device only.
25
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
To view the generated assembly code, do the following:
1. After the build successfully completes, click the Show Assembly button in the toolbar or select View > ASM View from the main menu. The assembly code appears in the Assembly Code text box, to the left of the OpenCL Code text box.
2. Click the Show Assembly
button again to hide the Assembly Code text box.
Choosing a Different Assembly Code Style Intel® SDK for OpenCL* Applications - Kernel Builder enables you to choose the assembly code style of the OpenCL code you build.
• •
Select Build > Build/Compile Options… In the Options window, under the Assembly Code Style group box uncheck the Use default assembly code style checkbox and select the appropriate style from the combo box below. The available items are: o Intel style – Default on Windows* OS o AT&T style – Default on Linux* OS
Choosing a Different Target Instruction Set Architecture The Kernel Builder tool enables you to choose the target instruction set architecture when building an OpenCL* code. It enables you to see the assembly code of different instruction set architectures and to generate program binaries for different hardware platforms. 1. Select Build > Build/Compile Options...
26
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
2. In the Options window, under the Instruction Set Architecture group box uncheck the Use current platform architecture checkbox and select the appropriate ISA from the combo box below. The available items are: o Streaming SIMD Extension 4.2 (SSE4.2) o Advanced Vector Extension (AVX) o Advanced Vector Extension 2 (AVX2). 3. Click OK to return to the main window. The name of the target ISA appears in the main windows top bar as an indicator, next to the file name.
Generating Intermediate Program Binaries The Kernel Builder tool enables you to generate program binaries of the OpenCL* code. An application can use generated program binaries to create program from binaries later (clCreateProgramFromBinary(…)).
1. After the build successfully completes, click the Create Program Binary button. 2. Choose the location and the name of the program binary in the Save As dialog box and click OK to save the file.
Saving Your Code to a Text File The Kernel Builder tool enables you to save the generated Assembly/LLVM code, as well as the OpenCL* code you entered.
Click the Save As… button and select the type of code that you want to save (Assembly, LLVM, or OpenCL) or select File > Save.
Compiling Your OpenCL* Kernels To compile your OpenCL* kernel, do the following: 1. Use the OpenCL Code text box to write your OpenCL code, or load the code from a file in one of the following ways:
o o o
Clicking the Open button in the toolbar Selecting File > Open, or by Dragging the file to the application.
2. Click the Compile button in the commands toolbar, or select Build > Compile File from the main menu to build the OpenCL* code. 3. Use the Console text box to see the compilation status. If the compilation succeeds, the text box background color turns green, otherwise, it turns red.
27
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
4. When compilation completes, you can save the compiled binary by clicking the Create Program Binary button
.
Linking OpenCL* Program Binaries To link OpenCL* program binaries, do the following:
1. Click the Link button in the commands toolbar, or select Build > Link Files from the main menu to link OpenCL* program binaries. 2. In the Select IR Files window, click the Choose Files button and select the compiled objects and libraries to be linked. Once you add all required compiled objects and libraries, click the Done button. 3. Look at the Console text box to see information on the linkage status. If the linkage succeeds, the text box background color turns green, otherwise, it turns red.
4. When linkage completes, you can save the created executable or library by clicking the Create Program Binary button
.
See Also Analyzing the OpenCL* Kernel Performance
28
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Configuring Link Options To configure link options for the OpenCL* code, do the following: 1. In the Select IR Files window, click the Link Options button. 2. In the Options window enter the link options in the Options text box. oClick the “…” button next to the Options text box to see the full list of options supported by the OpenCL standard.
Kernel Performance Analysis Analyzing the OpenCL* Kernel Performance
To analyze your OpenCL* kernel performance, do the following: 1. Open the Analyze Board and push the Refresh kernel(s) button to get the list of kernels in your currently opened editor.
NOTE If only one kernel exists it is chosen automatically 2. In the Kernel Builder pull-down menu choose the target kernel from the list of kernels in the currently open code. The Argument Table appears. 3. Create any buffers, images and samplers that your kernel requires, by using the Analyze menu or by clicking the assign variable cell in the table. 4. From the argument table, create variables to use as kernel arguments.
29
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
NOTE Create buffers, images and samplers before assigning them to kernel argument. One-dimensional variables (such as integer, float, char, half etc.) are created on-the-fly by entering a single value to the table. See section "Creating Variables" for more information. 5. Once you define the variables, assign them to kernel arguments, so that the application can pass them to the kernel while executing it.
NOTE The Arguments Table, including the assigned variables, automatically restores when you choose the kernel from the kernels list again. 6. You can view buffer and image contents by right-clicking a variable name or any of its detail lines and choosing the Show variable content option. After you do so, a popup with the values/image appears. If a buffer is shown, you can edit its values and save it to a CSV file.
NOTE Saving buffers to CSV files changes buffer definition, so it points to the new CSV.
30
IntelÂŽ SDK for OpenCL* Applications - User's guide for Windows* OS
7. Choose the global and local work group size per dimension for your workload. You can use the local group sizes for several different test types, based on your analysis needs and desires.
o
Single size
o
List of comma separated sizes
o
0 for default local group size assigned by the framework
o
Auto â&#x20AC;&#x201C; the tool iterates on all powers of two that are smaller than both the global work size and the device maximum local group size.
You can test each dimension independently, which means that you can use each option on each dimension. Use a single value to analyze the kernel in its designed conditions. Use the "Auto" option or a list of comma separated values to find the local group size that provides the higher performance results.
8. To improve the analysis accuracy, run each global/local work group size combination several times (called iterations), which minimizes the impact of other system processes or tasks on the kernel execution time. 9. Use the device information dialog to compare device properties and to choose the appropriate device for the kernel. 10. Click the Analyze button to generate an application wrapping the specific kernel and to execute all the requested analyses with it.
31
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
The Analysis Results tab contains view layouts. The top part shows the best and worst configuration of global and local group sizes, and a pull-down menu of all the configurations tested. The bottom part shows the tabs that provide the following data for the selected configuration:
•
Execution statistics (average, median, standard deviation, minimum & maximum iteration) for the total execution, and their breakdown to the queue, submit and run portions.
•
Statistics per iteration (total, submit, queue and execution).
•
Variable handling statistics and view – read and read-back times, as well as output buffer/image viewing.
32
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
See Also Creating Variables
Creating Variables To create variables in the system, do the following: 1. Select Analyze > Variable management. 2. In the Variable Management dialog click Add. 3. In the Select Variable Type dialog choose the desired type from the Type combo box.
33
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Use CSV files, random values or zeroes to create buffers. When you use CSV files, each line represents one OpenCL* data type (like int4, float16, etc.), and the number of used lines determines the buffer size. The CSV file may hold more columns or lines than needed for a specific buffer, but not fewer.
NOTE Output buffers do not need a value assigned to them.
34
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
Use input bitmap files and the parameters from them to create images. Create output images with the correct size, type, channel order, and so on.
The Get output image data checkbox disables reading back the output buffer/image. It means that you can try more than one combination of global/local worksizes, where there is no need to read the same output for all the combinations. Both buffers and images enable you to choose memory options.
You are not limited in selecting options. Avoid selecting the option combinations that are forbidden by the OpenCL* 1.2 specification, otherwise you will encounter errors upon analysis.
35
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Editing Variables To edit the variables in the system, do the following: 1. 2. 3. 4.
Select Analyze > Variable management. In the Variable Management dialog right click a variable name. In the menu that appears click Edit Variable Properties. In the dialog that appears simply change the desired properties and click Done.
Viewing Variable Contents To view the Buffer/Image contents, do the following: 1. Select Analyze > Variable management. 2. In the Variable Management dialog right click a buffer or image name you want to edit, and then click Show Variable Contents or expand the variable properties and double-click the Source line.
Deleting Variables To delete variables, do the following: 1. Select Analyze > Variable Management. 2. In the Variable Management dialog right-click a variable name. 3. In the menu that appears click Delete Variable or Delete all Variables.
Configuration Results The Analysis Results tab of the Kernel Builder enables you to see the best and worst configuration (based on median execution time) from the various global/local group sizes tested. In case only one configuration is generated (when only one local group size per dimension was used), the result appears in both result windows.
To export or view the analysis results, do the following: 1. Go to the Analysis Results tab. 2. Select the needed tab: o Execution Statistics o Execution Iteration Times (ms) o Variable Handling. 3. Right-click the table and choose the action you need to perform: o Export Selected Configuration – save to CSV. o Export All Configurations – save to CSV. o Show All Configurations – present in a table dialog.
36
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Statistics for Each Configuration The Analysis Results tab of the Kernel Builder enables you to see statistical analysis results for a selected configuration. The statistics consists of the following iteration execution time values for the selected configuration:
• • • • •
median average standard deviation maximum minimum.
The second table shows the breakdown of execution time to enqueue, submit, and runtime. The third table shows the runtime per iteration.
37
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Statistics per Iteration The Analysis Results tab of the Kernel Builder enables you to see the total run time, the breakdown to queue, submit and execute times per iteration for the given configuration.
See Also Variable Statistics
Variable Statistics The Analysis Results tab of the Kernel Builder enables you to see read and read-back times for each variable, as well as the output file path for output parameters. Clicking on this input/output path pops up its content (images and buffers).
NOTE The analysis results restore each time you select the kernel from the kernel list.
38
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Viewing Intel Devices
Click the Device Info
button to see all Intel devices on your machine with all their properties.
Working with the Intel® SDK for OpenCL* Offline Compiler - Command-Line Version The Intel® SDK for OpenCL* Applications 2013 - Offline Compiler provides a command-line utility for Linux* and Windows* OS. The command-line tool is located in the following directory:
• •
$(INTELOCLSDKROOT)\bin\x86 - for the 32-bit version of the tool $(INTELOCLSDKROOT)\bin\x64 - for the 64-bit version of the tool.
Use the command ioc32 -help or (ioc64 -help) to view help information on all available switches in the command window.
See Also Using Kernel Builder Integration with Microsoft Visual Studio*
39
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Developing OpenCL* Applications Using Offline Compiler Plug-in Using Offline Compiler Integration with Microsoft Visual Studio* Intel® SDK for OpenCL* Applications 2013 provides a plug-in for Microsoft Visual Studio* software. The plug-in enables you to develop OpenCL applications with Visual Studio IDE. The plug-in supports Visual Studio versions 2008, 2010 and 2012. The plug-in supports the following features:
• • • • • • • • •
New project templates New OpenCL file (*.cl) template Syntax highlighting Types and functions auto-completion Offline compilation and build of OpenCL kernels LLVM code view Assembly code view program IR generation Selection of target OpenCL device – CPU or GPU
NOTE To work with the plug-in features, you need to create an OpenCL project template or to convert an existing standard project into OpenCL project . See section "Creating new OpenCL Project Template" and "Converting an Existing Microsoft Visual Studio Project to OpenCL Project" for more information.
See Also Creating new OpenCL* Project Template Converting an Existing Microsoft Visual Studio Project to OpenCL Project
Creating new OpenCL* Project Template To enable the OpenCL* features, you need to create an OpenCL template. To create a new OpenCL Project Template, do the following: 1. Go to: File > New > Project… 2. Select the OpenCL templates from the Installed Templates tree view (under Visual C++)
40
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
To add new OpenCL kernel file to the project, do the following: 1. Go to: Project > Add New Item… 2. Select the OpenCL templates from the Installed Templates tree view (under Visual C++)
This plug-in works with the *.cl file extensions.
Converting an Existing Microsoft Visual Studio Project to OpenCL Project The Intel® SDK for OpenCL* Applications plug-in for Microsoft Visual Studio* software enables you to convert a standard C/C++ project to an Intel SDK for OpenCL Applications project and vice versa. To convert your project, do the following: 1. Right-click the project you want to convert in the Solution Explorer.
41
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
2. In the project menu choose Convert to an Intel SDK for OpenCL project or Convert to a standard C/C++ project (only one appears, depending on the project type).
Building an OpenCL* Project When the OpenCL* project is built, all OpenCL kernels in the kernel files are built as-well with the Intel OpenCL compiler. You can see the build result in the Output Build Dialog in Visual Studio* software.
Using the OpenCL* Build Properties The OpenCL* Build properties page enables you to set the compilation flags and change the target device when building an OpenCL kernel. To change the settings, do the following: 1. Go to: Project > Properties. 2. Look for the Intel SDK for OpenCL Applications dialog under the Configuration Properties group.
See Also Choosing a Target Device
42
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Choosing a Target Device The Intel® SDK for OpenCL* Applications - Kernel Builder Integration with Microsoft Visual Studio* API enables you to choose the target device (Intel® Architecture CPU or Intel® HD Graphics) when building your OpenCL* code. The default device is CPU. To choose a target device, do the following: 1. Before the build, open the project properties dialog and go to Configuration Properties > Intel SDK for OpenCL Applications > General 2. Change the Device option to the device of your choice
Generating Intermediate Program Binaries The Intel® SDK for OpenCL* Applications - Kernel Builder integration with Microsoft Visual Studio* software enables you to generate program binaries of the OpenCL code. An application can use generated program binaries to create program from binaries later (clCreateProgramFromBinary(…)). To generate intermediate program binaries, do the following: 1. Before the build, open the project properties dialog and go to Configuration Properties > Intel SDK for OpenCL Applications > General 2. Change the Create Program Binary option to Yes.
Generating and Viewing LLVM Code The Intel® SDK for OpenCL* Applications 2013 plug-in for Microsoft Visual Studio* software enables you to generate a file that contains the LLVM code for your *.cl file. To enable generating and viewing LLVM code, do the following: 1. Before the build, open the project properties dialog and go to Configuration Properties > Intel SDK for OpenCL Applications > General 2. Change the Generate LLVM Code option to Yes.
43
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
After the build, you can open the generated LLVM file in the Visual Studio* editor by double clicking the message in the output view.
See Also Generating and Viewing Assembly Code
Generating and Viewing Assembly Code The Intel® SDK for OpenCL* Applications plug-in for Microsoft Visual Studio* software enables you to generate a file that contains the assembly code for your *.cl file. To enable generating and viewing the assembly code, do the following: 1. Before the build, open the project properties dialog and go to Configuration Properties > Intel SDK for OpenCL Applications > General 2. Change the Generate Assembly Code option to Yes.
After the build, you can open the generated assembly file in the Visual Studio editor by double clicking the message in the output view.
44
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
Configuring the Build Options The Intel® SDK for OpenCL* Applications plug-in for Microsoft Visual Studio* software enables you to configure the build options for your OpenCL code. To configure the build options, do the following: 1. Before the build, open the project properties dialog and go to Configuration Properties > Intel SDK for OpenCL Applications > General 2. Add your build options in the “Additional build options” line.
Using the Intel® SDK for OpenCL* - Debugger Introducing the Intel® SDK for OpenCL* - Debugger The Intel® SDK for OpenCL* Applications - Debugger is a Microsoft Visual Studio* plug-in that enables you to debug OpenCL kernels using the Microsoft Visual Studio software debugger GUI. The Intel SDK for OpenCL Applications - Debugger provides debugging experience across host and OpenCL code, by supporting host code debugging and OpenCL kernel debugging in a single Microsoft Visual Studio debug session. The debugger supports existing Microsoft Visual Studio debugging windows such as:
• • • • •
Breakpoints Memory view Watch variables – including OpenCL types like float4, int4, and so on. Call stack Auto and local variables views
NOTE Debugging is available only for CPU device. If you target your code to run on Intel® HD Graphics, you can debug it on the CPU device during development phase and when ready to change the target device.
45
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
For debugger limitations and known issues refer to the Intel SDK for OpenCL Applications 2013 Release Notes.
NOTE The Intel SDK for OpenCL Applications - Debugger works with Microsoft Visual Studio versions 2008, 2010 and 2012. Other versions of the Microsoft Visual Studio are not supported. You must acquire Microsoft Visual Studio software separately.
See Also Intel SDK for OpenCL Applications - Release Notes
Debugging Your OpenCL* Kernel with Intel® SDK for OpenCL* - Debugger To work with the Intel® SDK for OpenCL* - Debugger plug-in, the OpenCL kernel code must exist in a text file separate from the code of the host. Debugging OpenCL code that appears only in a string embedded in the host application is not supported. Create your OpenCL project with the Offline Compiler plug-in for Microsoft Visual Studio* to get seamless integration with the Debugger. To debug an OpenCL kernel, follow these steps: 1. Enable debugging mode in the Intel OpenCL runtime for compiling the OpenCL code: add the –g flag to the build options string parameter in the clBuildProgram function. 2. Specify the full path of the file in the build options string parameter to the clBuildProgram function accordingly: -s <full path to the OpenCL source file> If the path includes spaces, enclose the entire path with double quotes (“<path_to_opencl_source_file>\”). For example: err = clBuildProgram( g_program, 0, NULL, “-g -s \“<path_to_opencl_source_file>\””, NULL, NULL); According to the OpenCL standard, many work items execute the OpenCL kernels simultaneously. The Intel SDK for OpenCL - Debugger requires setting in advance the global ID of the work item that you want to debug, before debugging session starts. The debugger stops on breakpoints in OpenCL code only when pre-set work item reaches them.
Configuring Intel® SDK for OpenCL* - Debugger To configure the Intel® SDK for OpenCL* Debugger, open the Debugging Configuration window:
46
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
1. Run Microsoft Visual Studio*. 2. Select Tools > Intel SDK for OpenCL - Debugger.
In the Basic Settings group box: 1. Check the Enable OpenCL Kernel Debugging check box to switch Intel® SDK for OpenCL Applications - Debugger on\off. 2. Enter the appropriate values in the Select Work Items field to select work items. You can select only one work item. The values specify its 3D coordinates. If a NDRange running in less than 3D (that is 1D or 2D), you must leave other dimensions at 0.
Troubleshooting the Intel® SDK for OpenCL* - Debugger The Intel® SDK for OpenCL* Applications - Debugger needs a local TCP/IP port to work correctly. On some occasions, you may encounter a problem for the debugger to use this port, due to a collision with another application or your firewall program. If you receive “Protocol error. If the problem continues, try changing Intel OpenCL kernel debugger port” message, you may need to change the debugging port number and/or change your firewall settings. To change the debugging port number, do the following: 1. 2. 3. 4.
Open OpenCL Debugging Configuration window Switch to Advanced Settings group box. Check the Use Custom Debugging Port check box. In the Debugging Port Number field enter the port you need.
Intel® SDK for OpenCL Applications - Debugger uses port 56203 by default.
Tuning with the Intel® Graphics Performance Analyzers Interoperability with the Intel® Graphics Performance Analyzers The Intel® SDK for OpenCL* Applications 2013 provides integration with the Intel® Graphics Performance Analyzers (Intel® GPA) to optimize and analyze OpenCL code in visual computing applications. You can use the following Intel GPA components: 1. Intel GPA System Analyzer for system utilization across the CPU and the Intel® HD Graphics.
47
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
2. Intel GPA Platform Analyzer for tracing the execution profiles of various OpenCL tasks on the CPU over a period of time. Execution on the Intel HD Graphics is not supported in this mode. To download Intel GPA, go to: http://software.intel.com/en-us/articles/intel-gpa/.
See Also Intel® GPA website
Using the Intel® Graphics Performance Analyzers (Intel® GPA) Platform Analyzer Introducing the Intel® Graphics Performance Analyzers Platform Analyzer The Intel® GPA Platform Analyzer visualizes the execution profile of the various tasks in your code over time. The Intel GPA Monitor collects real-time trace data during the application run and saves it to a trace capture file. The Intel GPA Platform Analyzer opens this file and provides a system-wide picture of: 1. how your code executes on software and hardware threads 2. how your code works with Intel® Media SDK and Microsoft* DXVA2 and how media-related workloads execute on various CPU and GPU cores in your system The tool infrastructure automatically aligns clocks across all cores in the entire system so that you can analyze CPU-based workloads together with GPU-based workloads within a unified time domain. The Intel SDK for OpenCL Applications 2013 provides built-in capturing support that takes advantage of the Intel GPA Platform Analyzer for execution of OpenCL kernels on CPU only. With the Intel GPA Platform Analyzer, you can trace the execution profiles of various OpenCL tasks on the CPU over a period of time and analyze it together with the graphics workloads (Microsoft DirectX* API) running on the GPU.
NOTE Intel SDK for OpenCL Applications 2013 is designed to work with Intel® GPA version 2012 R4 or higher. Intel SDK for OpenCL Applications 2013 provides built-in capturing support, with the following profiling features: 1. Context view - (available on Intel GPA 4.2 or higher) - enables you to examine the flow of the OpenCL* commands and their dependencies within the application context command queues. 2. API tracing - enables you to capture and measure the time of the application OpenCL API calls. 3. GPU Counters - provide GPU profiling by inspecting a set of GPU performance counter. This is applicable only for OpenCL tasks that run on Intel GPU device. 4. Device view (CPU only) - enables you to see the distribution of the application OpenCL commands (kernels and memory operations) across the system software threads. You can use the generated trace capture file later to identify critical bottlenecks, analyze information on the execution flow, and to improve your application performance.
48
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
Enabling OpenCL* API Tracing The OpenCL* API Tracing feature enables you to view detailed timing information about your OpenCL* workload by inspecting the time the host API took to execute. It demonstrates which certain parts of the Host API took more time to execute. The OpenCL* API Tracing is disabled by default. To enable it, you need to control the following environment variables:
Environment Variable
Default Value
CL_GPA_CONFIG_ENABLE_API_TRACING True
Generating a Trace Capture File You can generate OpenCL* traces using the Profiler mechanism or runtime instrumentation (for CPU device only).
Using Profiler Mechanism This mechanism uses a special DLL to capture the OpenCL information, needed for Intel速 GPA trace generation. This method enables all profiling features. You can use it to generate traces for tasks running on Intel速 HD Graphics (or any other OpenCL device). To work in the profiling mode, you should make several changes to the system registry. These changes include changing the target runtime DLL (instead of the OpenCL driver) and creating several Registry variables used by the Profiler.
Using Runtime Instrumentation (CPU only). This method generates OpenCL tracing information by reporting the current state within the OpenCL driver. This method provides low level of information and is limited to tasks executed on CPU device only.
NOTE The CPU runtime instrumentation is activates automatically when you switch to profiling mode. You can manually enable or disable it through the following environment variable
Environment Variable Value CL_CONFIG_USE_GPA
True/False
Intel速 SDK for OpenCL* Application provides a configuration utility which will help developers to enable (and disable) Profiling mode and make all necessary changes in system registry.
49
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
When running the profiling control, the current working mode is selected. To see the working mode, in the profiler configuration window click the requested mode, then click the Apply button.
NOTE The OpenCL Profiler Configuration Utility may cause an overhead penalty. Expect OpenCL programs to run slower. To restore performance, set back the original OpenCL settings after you finish using the Profiler to generate GPA traces. To set back the original OpenCL settings,run the configuration utility and choose Use Intel OpenCL Platform, then click Apply and Exit To generate a trace file follow these steps: 1. Enable Intel GPA tracing by launching the Profiling configuration utility and switching to profiling mode (in case using Profiling mode) or setting the required environment variable described in the Enabling OpenCL API Tracing section (in case of using runtime tracing mode). 2. Run the Intel GPA Monitor application.
3. Click the Manage Profiles... button and check the Capture Application Startup option under the Tracing tab.
50
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
4. Add the full path of the executable in the Command Line text box and click the Run button to run the application. 5. When the execution finishes, the trace file appears under the output folder of Intel GPA trace files, for example: C:\Users\<user name>\Documents\GPA_2012_R4 where <user name> is the name of your local user account. You can load the generated OpenCL trace data file into the Intel GPA applications. After loading the trace file, tasks and markers items appear in the application timeline view.
See Also Enabling OpenCL API Tracing Intel® GPA Getting Started Guide
OpenCL* Device View The OpenCL* Device View enables you to see the distribution of the application OpenCL* commands (kernels and memory operations) across the system software threads.
Tasks Two types of tasks are available on the timeline tracker:
•
The OpenCL Kernels. Each task on the timeline tracker represents a cluster of workgroups which belongs to the same kernel. All tasks which belong to the same kernel are marked is the same color; each kernel has a different color.
•
Memory operations. In addition to the OpenCL kernels, there are tasks on the timeline tracker which represent memory operations: Read, Write, Copy, Map and Un-map. All of these commands are red colored ( ).
Markers For each command in the queue, the trace enables you to see a marker on the time ruler, indicating the state of the command. The following markers are supported:
• •
A marker indicating the time command was entered to the commands queued. A marker indicating the time command submitted to the device.
51
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
• •
A marker indicating the time command started running. A marker indicating the time command completed.
The default view shows the queued and the completed markers only.
Controlling the Markers Display You can change the markers display in the time ruler by adding and modifying the following environment variables: Environment Variable
Default Value
CL_GPA_CONFIG_SHOW_QUEUED_MARKER
True
CL_GPA_CONFIG_SHOW_SUBMITTED_MARKER False CL_GPA_CONFIG_SHOW_RUNNING_MARKER
False
CL_GPA_CONFIG_SHOW_COMPLETED_MARKER True
OpenCL* Context View The OpenCL* Context View enables you to examine the flow of the OpenCL commands and their dependencies within the application context command queues. You can view detailed timing information about your OpenCL workload by inspecting the time required to queue commands for the device, and the time the device took to execute items from the queue.
It enables you to know which commands took more time to execute. Each track in the context view represents an OpenCL commands queue. These tracks can be easily identified by the turquoise ( ) color of their left headers. Two types of tracks exist:
• •
in-order queue (representing in-order OpenCL commands queue) out-of-order queue (representing out-of-order OpenCL commands queue).
The tasks within these tracks represent OpenCL commands (OpenCL kernels or memory operations). All commands have different colors. Each task is logically divided into two parts:
• •
The wait time of an OpenCL command in the queue, before scheduling it for execution on the device. The actual execution time of an OpenCL command.
The two parts share the same color, although with different brightness levels. The colors of the commands in the context view trackers are identical to the colors of the corresponding commands in the device view tracks.
52
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
GPU Counters When using Profiler, you can inspect GPU counters values for OpenCL* tasks running on Intel速 HD Graphics (GPU) device. You can access GPU counters by clicking any OpenCL task in the Context view then clicking the Metadata tab, which appears in the bottom-right window.
53
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
GPU Counters Description Counter Name
Counter Explanation
Compute Shader Active Time
Total time the compute shader spent active on all cores.
Compute Shader Stall Time – Core Stall
Total time in clocks the compute shader spent stalled on all cores – and the entire core was stalled as well.
Number of CS threads loaded
Number of CS threads loaded at any given time in the EUs.
Using the Intel® Graphics Performance Analyzers System Analyzer Intel® Graphics Performance Analyzers 2013 (Intel® GPA) introduced support of various metrics collections in the Intel GPA System Analyzer. You can use the following types of OpenCL*-related metrics: 1. CPU-specific metrics, such as core utilization 2. Execution unit (EU) metrics, of which GPU EU Active, Idle, or Stalled are most important
54
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
3. Memory metrics, such as GPU Memory Read and Write operations. 4. Power metrics for CPU, Intel® HD Graphics, and both devices simultaneously. Refer to the Intel GPA Online Help for details. You can also use the Intel GPA System Analyzer to profile non-DirectX* applications, if they are instrumented by Intel® ITT API. In this case the Metric Tree Control panel shows only CPU metrics.
NOTE “GPU busy” metric does not include general computations like OpenCL*, so it actually means that the GPU is currently processing 3D rendering contexts. The following screenshot shows the Intel GPA System Analyzer in action:
Metric Tree Control Panel on the left side displays the tree of available metrics. Metric Chart Area shows up to 16 charts that visualize the metrics performance. To add a metric to the chart, drag and drop it from the list of available metrics. In the screenshot above, you can see the following metrics beyond the general “Frame Time” chart:
• •
Family of execution unit metrics for this specific application. The percentage of GPU execution units stalled is quite high, which might indicate high bandwidth pressure (below) or a stall in other fixed function units, such as mathbox. GPU Memory Reads/Writes. The traffic of read\write operations is intensive. Use the Intel® SDK for OpenCL* Applications Optimization Guide for hints on memory performance.
55
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Refer to the Intel GPA Online Help for information on how to inspect metrics and find performance bottlenecks.
See Also Intel SDK for OpenCL Applications - Optimization Guide Intel GPA website
Interoperability with the Intel® VTune™ Amplifier XE Introducing the Intel® VTune™ Amplifier XE Intel® SDK for OpenCL* Applications 2013 enables you to see the assembly code of your OpenCL kernels Just-in-time (JIT) code and to analyze its performance through sampling profiling (call-graph profiling is not supported in version 2012) using the graphical interface of the Intel® VTune™ Amplifier XE Performance Profiler tool. You must acquire the Intel VTune Amplifier XE 2011 separately. For more information, see the VTune Amplifier XE 2011 page at http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.
NOTE The profiling support of Intel SDK for OpenCL Applications 2013 is designed to work with the Intel VTune Amplifier XE 2011 and Intel VTune Performance Analyzer 9.1 or higher. Using other versions may cause undefined results.
See Also Setting a New Profiling Project Intel® VTune™ Amplifier XE 2011 at Visual Computing Source portal
Setting a New Profiling Project The instructions below refer to the Intel® VTune™ Amplifier XE 2011. Take into consideration that user interface of the Intel VTune Performance Analyzer may differ slightly. To run profiling on the Intel VTune Amplifier XE follow these steps: 1. Create a new sampling project by selecting File > New > Project... 2. In the Create a Project dialog, enter your new project name and click Create Project. 3. In the Project Properties window, select the application to run, including the working directory and command arguments. 4. Click Modify... next to the user-defined environment variables text box and add the following line to the User-defined Environment Variables table: CL_CONFIG_USE_VTUNE=True 5. Click OK in all open windows to save the new settings. 6. Click the New Analysis button to run and analyze your application. 7. In the Analysis Type window, select the type you need and click Start to run the analysis.
Viewing the OpenCL* Kernel’s Assembly Code To view the OpenCL* Kernel Assembly Code, do the following:
56
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
1. Wait until the sampling activity finishes. 2. click the Hotspots Bottom-up button at the navigation toolbar. 3. Select the /Function option in the Data Grouping selection box and look for your application OpenCL kernels in the Functions data grid:
If you are running several applications with identical OpenCL kernel names simultaneously, select the /Process /Function /Thread option in the Data Grouping selection box and look for the application process in the Processes data grid. The OpenCL kernel that belongs to the application is located under the application name in the application tree view, as shown in the example below:
4. Double-click the selected OpenCL kernel to see the assembly source code and its relevant sampling information:
Source Level Profiling of Your OpenCL* Kernel Through the source level profiling mode, you can perform hotspots analysis against the original OpenCL* kernel code. Hotspots analysis enables you to analyze kernel performance and bottlenecks of
57
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
an OpenCL* kernel through sampling profiling using the GUI of the Intel® VTune™ Amplifier XE Performance Profiler tool. In the build options string parameter in the clBuildProgram function: 1. Add the –profiling flag. 2. Specify the full path of the file -s <full path to the OpenCL* source file> If the path includes spaces, enclose the entire path with double quotes (“<path_to_opencl_source_file>\”). For example: err = clBuildProgram( g_program, 0, NULL, “-profiling -s \“<path_to_opencl_source_file>\””, NULL, NULL);
See Also Interoperability with the Intel® VTune™ Amplifier XE on Linux* OS
OpenCL* Tracing Support with the Intel® VTune™ Amplifier XE Introducing OpenCL* Tracing The Intel® Vtune™ Amplifier XE provides an experimental feature, which supports visual display of the execution profile of the various tasks in your code over a period of time. The tool collects real-time trace data during the application run and provides a thread-oriented view of you application. Besides OpenCL* commands, it also collects system-level threading issues, and Intel® Threading Building Blocks (Intel® TBB) events, to give you a detailed view of the running system. To download Intel Vtune Amplifier XE, go to: http://software.intel.com/en-us/articles/vcsource-tools-vtune-amplifier-xe/.
Capturing OpenCL* Trace You need to do the following to configure the Intel® Vtune™ Amplifier XE work: 1. Set the environment variables: o AMPLXE_EXPERIMENTAL=all o CL_CONFIG_USE_GPA=true o CL_CONFIG_USE_ITT=true
58
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
2. In the Intel Vtune Amplifier XE, invoke the New Analysis window and set Analyze user tasks check box.
3. Click Start. 4. Once the collection is done, switch to Bottom-Up view to see the tasks collected in the timeline:
5. For task-focused analysis, switch to Task Analysis viewpoint:
59
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
6. Use the Task Time and CPU Time panes to view the task data:
Appendix A - Supported Images Formats Read-Only Surface Formats cl_channel_order cl_channel_type
GPU Device CPU Device
CL_RGBA
CL_UNORM_INT8
Yes
Yes
CL_RGBA
CL_UNORM_INT16
Yes
Yes
CL_RGBA
CL_SIGNED_INT8
Yes
Yes
CL_RGBA
CL_SIGNED_INT16
Yes
Yes
CL_RGBA
CL_SIGNED_INT32
Yes
Yes
60
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
CL_RGBA
CL_UNSIGNED_INT8
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT16
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT32
Yes
Yes
CL_RGBA
CL_HALF_FLOAT
Yes
Yes
CL_RGBA
CL_FLOAT
Yes
Yes
CL_R
CL_FLOAT
Yes
No
CL_R
CL_UNORM_INT8
Yes
No
CL_R
CL_UNORM_INT16
Yes
No
CL_R
CL_SIGNED_INT8
Yes
No
CL_R
CL_SIGNED_INT16
Yes
No
CL_R
CL_SIGNED_INT32
Yes
Yes
CL_R
CL_UNSIGNED_INT8
Yes
No
CL_R
CL_UNSIGNED_INT16
Yes
No
CL_R
CL_UNSIGNED_INT32
Yes
Yes
CL_R
CL_HALF_FLOAT
Yes
No
CL_INTENSITY
CL_UNORM_INT8
Yes
No
CL_INTENSITY
CL_UNORM_INT16
Yes
No
CL_INTENSITY
CL_HALF_FLOAT
Yes
No
CL_INTENSITY
CL_FLOAT
Yes
Yes
CL_LUMINANCE
CL_UNORM_INT8
Yes
No
CL_LUMINANCE
CL_UNORM_INT16
Yes
No
CL_LUMINANCE
CL_HALF_FLOAT
Yes
No
CL_LUMINANCE
CL_FLOAT
Yes
Yes
CL_A
CL_UNORM_INT8
Yes
No
CL_A
CL_UNORM_INT16
Yes
No
61
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
CL_A
CL_HALF_FLOAT
Yes
No
CL_A
CL_FLOAT
Yes
No
CL_RG
CL_UNORM_INT8
Yes
No
CL_RG
CL_UNORM_INT16
Yes
No
CL_RG
CL_SIGNED_INT8
Yes
No
CL_RG
CL_SIGNED_INT16
Yes
No
CL_RG
CL_SIGNED_INT32
Yes
No
CL_RG
CL_UNSIGNED_INT8
Yes
No
CL_RG
CL_UNSIGNED_INT16
Yes
No
CL_RG
CL_UNSIGNED_INT32
Yes
No
CL_RG
CL_HALF_FLOAT
Yes
No
CL_RG
CL_FLOAT
Yes
No
Write-Only Surface Formats cl_channel_order cl_channel_type
GPU Device CPU Device
CL_RGBA
CL_UNORM_INT8
Yes
Yes
CL_RGBA
CL_UNORM_INT16
Yes
Yes
CL_RGBA
CL_SIGNED_INT8
Yes
Yes
CL_RGBA
CL_SIGNED_INT16
Yes
Yes
CL_RGBA
CL_SIGNED_INT32
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT8
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT16
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT32
Yes
Yes
CL_RGBA
CL_HALF_FLOAT
Yes
Yes
CL_RGBA
CL_FLOAT
Yes
Yes
CL_R
CL_FLOAT
Yes
No
62
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
CL_R
CL_UNORM_INT8
Yes
No
CL_R
CL_UNORM_INT16
Yes
No
CL_R
CL_SIGNED_INT8
Yes
No
CL_R
CL_SIGNED_INT16
Yes
No
CL_R
CL_SIGNED_INT32
Yes
Yes
CL_R
CL_UNSIGNED_INT8
Yes
No
CL_R
CL_UNSIGNED_INT16
Yes
No
CL_R
CL_UNSIGNED_INT32
Yes
Yes
CL_R
CL_HALF_FLOAT
Yes
No
CL_INTENSITY
CL_UNORM_INT8
No
No
CL_INTENSITY
CL_UNORM_INT16
No
No
CL_INTENSITY
CL_HALF_FLOAT
No
No
CL_INTENSITY
CL_FLOAT
No
Yes
CL_LUMINANCE
CL_UNORM_INT8
No
No
CL_LUMINANCE
CL_UNORM_INT16
No
No
CL_LUMINANCE
CL_HALF_FLOAT
No
No
CL_LUMINANCE
CL_FLOAT
No
Yes
CL_A
CL_UNORM_INT8
Yes
No
CL_A
CL_UNORM_INT16
No
No
CL_A
CL_HALF_FLOAT
No
No
CL_A
CL_FLOAT
No
No
CL_RG
CL_UNORM_INT8
Yes
No
CL_RG
CL_UNORM_INT16
Yes
No
CL_RG
CL_SIGNED_INT8
Yes
No
CL_RG
CL_SIGNED_INT16
Yes
No
63
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
CL_RG
CL_SIGNED_INT32
Yes
No
CL_RG
CL_UNSIGNED_INT8
Yes
No
CL_RG
CL_UNSIGNED_INT16
Yes
No
CL_RG
CL_UNSIGNED_INT32
Yes
No
CL_RG
CL_HALF_FLOAT
Yes
No
CL_RG
CL_FLOAT
Yes
No
Read-Write Surface Formats cl_channel_order cl_channel_type
GPU Device CPU Device
CL_RGBA
CL_UNORM_INT8
Yes
Yes
CL_RGBA
CL_UNORM_INT16
Yes
Yes
CL_RGBA
CL_SIGNED_INT8
Yes
Yes
CL_RGBA
CL_SIGNED_INT16
Yes
Yes
CL_RGBA
CL_SIGNED_INT32
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT8
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT16
Yes
Yes
CL_RGBA
CL_UNSIGNED_INT32
Yes
Yes
CL_RGBA
CL_HALF_FLOAT
Yes
Yes
CL_RGBA
CL_FLOAT
Yes
Yes
CL_R
CL_FLOAT
Yes
No
CL_R
CL_UNORM_INT8
Yes
No
CL_R
CL_UNORM_INT16
Yes
No
CL_R
CL_SIGNED_INT8
Yes
No
CL_R
CL_SIGNED_INT16
Yes
No
CL_R
CL_SIGNED_INT32
Yes
Yes
CL_R
CL_UNSIGNED_INT8
Yes
No
64
Intel速 SDK for OpenCL* Applications - User's guide for Windows* OS
CL_R
CL_UNSIGNED_INT16
Yes
No
CL_R
CL_UNSIGNED_INT32
Yes
Yes
CL_R
CL_HALF_FLOAT
Yes
No
CL_INTENSITY
CL_UNORM_INT8
No
No
CL_INTENSITY
CL_UNORM_INT16
No
No
CL_INTENSITY
CL_HALF_FLOAT
No
No
CL_INTENSITY
CL_FLOAT
No
Yes
CL_LUMINANCE
CL_UNORM_INT8
No
No
CL_LUMINANCE
CL_UNORM_INT16
No
No
CL_LUMINANCE
CL_HALF_FLOAT
No
No
CL_LUMINANCE
CL_FLOAT
No
Yes
CL_A
CL_UNORM_INT8
Yes
No
CL_A
CL_UNORM_INT16
No
No
CL_A
CL_HALF_FLOAT
No
No
CL_A
CL_FLOAT
No
No
CL_RG
CL_UNORM_INT8
Yes
No
CL_RG
CL_UNORM_INT16
Yes
No
CL_RG
CL_SIGNED_INT8
Yes
No
CL_RG
CL_SIGNED_INT16
Yes
No
CL_RG
CL_SIGNED_INT32
Yes
No
CL_RG
CL_UNSIGNED_INT8
Yes
No
CL_RG
CL_UNSIGNED_INT16
Yes
No
CL_RG
CL_UNSIGNED_INT32
Yes
No
CL_RG
CL_HALF_FLOAT
Yes
No
CL_RG
CL_FLOAT
Yes
No
65
Intel® SDK for OpenCL* Applications – User’s Guide for Windows* OS
Appendix B – OpenCL* Build and Linking Options Preprocessor Options Option
Description
-D <name>
Predefines name as a macro, with definition 1
GPU Device CPU Device
The contents of definition are tokenized and processed as if they appeared during -D <name=definition> translation phase three in a #define directive. In particular, the definition will be truncated by embedded newline characters -I <dir>
Add the directory <dir> to the list of directories to be searched for header files
Yes
Yes
Yes
Yes
Yes
Yes
Math Intrinsics Options GPU Device
CPU Device
Option
Description
-cl-single-precisionconstant
Treats double precision floating-point constant as single precision constant
Yes
No
-cl-denorms-are-zero
This option controls how single precision and double precision denormalized numbers are handled. If specified as a build option, the single precision denormalized numbers may be flushed to zero; double precision denormalized numbers may also be flushed to zero if the optional extension for double precision is supported
No
Yes
GPU Device
CPU Device
Optimization Options Option
Description
-cl-opt-disable
This option disables all optimizations. Optimizations are enabled by default.
No
Yes
-cl-mad-enable
Enables a * b + c to be replaced by a mad. The mad computes a * b + c
Yes
No
66
Intel® SDK for OpenCL* Applications - User's guide for Windows* OS
with reduced accuracy.
-cl-no-signed-zeros
Enables optimizations for floating-point arithmetic that ignore the signedness of zero. IEEE 754 arithmetic specifies the distinct behavior of +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with-clfinite-math only). This option implies that the sign of a zero result is not significant.
No
No
Yes
No
Enables optimizations for floating-point arithmetic that, -cl-unsafe-mathoptimizations
1. assume that arguments and results are valid 2. may violate IEEE 754 standard 3. may violate the OpenCL* numerical compliance requirements.
-cl-finite-math-only
Enables optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or >±∞.
Yes
No
-cl-fast-relaxed-math
Sets the optimization options -cl-finite-math-only and -cl-unsafe-math-optimizations. This enables optimizations for floating-point arithmetic that may violate the IEEE 754 standard and the OpenCL* numerical compliance requirements.
Yes
Yes
Options for Warnings Option Description
GPU Device CPU Device
Inhibit all warning messages
Yes
Yes
-Werror Make all warnings into errors
Yes
Yes
-w
Options Controlling the OpenCL* C Version Option
Description
-cl-std= Determine the OpenCL* C language version to use.
GPU Device CPU Device Yes
Yes
67