Posts Tagged ‘Data’

Parallel Building Blocks – Vectorize Inside the Cilk ‘for’ with Array Notations

100 blogs and 100 videos in 100 days Blog 4: Vectorize inside the Cilk ‘for’ with Array Notations Part 1: Why vectorize? In the previous blog, I gave a very fast crash course on the Cilk ‘for’. There are a few other ways you can manipulate that for loop, but I will go over that another time. The key is to understand what is possible first, get the basics running and parallelized, and use the more expert features later. I think people new to parallel programming get too caught up on speedups and start using the full gamut of cilk features. I really feel the best course of action with any of the PBB models is the following: 1. Get it to scale across some cores with threading. Use some of the parallel studio features to understand how many cores are being scaled to based on your data size. 2. See how you can vectorize. I will discuss a very convenient option for vectorization with the Array Notations feature here. So for even parallel programming veterans, vectorization can be a scary black art. DEFINITION: Vectorization takes code that performs operations on individual operands and utilizes Intel® Streaming SIMD Extensions to perform those operations on multiple sets of data in parallel. This means you’re using one of the possible SIMD instruction sets (right now usually a variant of SSE or AVX for Sandy Bridge) to take advantage of parallelism *inside* each individual core. If you just used the Cilk ‘for’, threads are spawned for an arbitrary amount of cores, but the SIMD registers, for the most part, are not being utilized, outside of some vectorization done by the compiler, which by the way, you have to understand/specify yourself (the compiler flags and pragmas are not that hard – we’ll discuss that in another blog). In other words, the lanes are not filled. You need to fill them! Thread parallelism: Take advantage of many cores: divide & conquer. Vector parallelism: CPU instructions that process multiple data elements You need both to maximize that fancy new processor. Otherwise you’re leaving some performance on the table. What’s the philosophy behind PBB vectorization? Enable all developers to use vector hardware (SSE instructions) in the CPU easily, without them having to use intrinsic functions or inline assembly. A convenient consequence of generated code is portability. You don’t have to rewrite your SIMD code every time a new processor/instruction set comes out. Conveniently, Cilk™ Plus is two components – the Cilk component for threading and the Array Notations component for vectorization. As I explained in the previous blog, these are actual language extensions, not templates. No new data types are added. So the compiler makes decisions at compile time based on the code you wrote. The language extension makes it very easy to convert your existing code. So what do these Array Notations look like? It’s a colon/bracket style syntax and there are lots of things you can do within the language to manipulate the colon/brackets. Behind the scenes the compiler generates tailored SIMD code based on your expression. In the example above, A is a C/C++ array or pointer variable. 0 is the lower bound. N is the length. It is like the memcpy() function call, which is memcpy(start_address, length); So A[0:N] means that we want to do a parallel operation on A[0],A[1],A[2],…,A[N-1]. We do not need special vector or matrix types for parallel operations. So you express your cilk ‘for’ loop and inside the loop express your array/vectors/matrices/cubes/hypercubes/whatever_you_want_to_call_them with this new syntax and define the operations you would like done on them. In the next blog entries, I will go into more details on the syntax with code samples. Tweet

Parallel Building Blocks – ‘for’ Loop Considerations

100 Blogs and 100 Videos in 100 days about Parallel Building Blocks Blog 1: Parallel Building Blocks – ‘for’ Loop Considerations So you’ve used Intel Parallel Amplifier to discover that a sizable percentage of your processing time is being spent inside of a very hairy for loop. Task Parallel or Data parallel algorithm considerations aside, your first reaction is, how can I get this for loop, at the very least, parallelized across some or all of the cores (note: these considerations vary depending on data size – there are cases where parallelizing across all cores does not warrant the overhead in distributing across those cores)? Each of the PBB models offers a different flavor of for loop, each with their similarities and differences to one another. So, I could easily just show you the syntax and usage for all of the ‘for’ loops available, but that isn’t solving your problem at all. I am a part of a customer centric group of consulting engineers at Intel – we create and find solutions within the set of DPD tools to solve problems. The fact is that regardless of your build environment and development toolchain, we have at least one solution for your parallelism needs (many times there’s multiple, hence the need to explain). So before we go into syntax, usage, and what goes on behind-the-scenes, I’m going to take a step back and help you understand exactly what type of for loop you are dealing with and in what build environment it can play well in. The first part of the decision process template library vs. language extension vs. template library + language extension. Some development groups don’t like or refuse to use any templates. That’s fine. Others want the purity of a language-extension free build environment and will only use them when the entire standard. No problem. Then we find our favorite group, those that will use both. That really opens the floodgates of options. Template Library but NO Language Extension If you want a template library and will not use a language extension for some degree of parallelism for the ‘for’ loop, you have these two choices. Intel Threading Building Blocks (TBB) vs. Intel Array Building Blocks (ArBB) TBB TBB has a high level syntax where you manage tasks, not individual threads for task parallel programming. So this will help you take advantage of parallelism in-between cores. It does not have implicit vectorization, but you do have the following options to add parallelism inside each individual core: – pragmas – compiler flags – API calls from another template library that will vectorize for you – explicit vectorization (writing your own AVX/SSE code) ArBB ArBB is the data parallel analogue to TBB. It uses TBB for threading. It has a higher-level syntax than TBB that, if used properly in conjunction with ‘map’ (will go into that in subsequent blogs), allows for both vectorization and threading without you even having to manage tasks. This is well suited for algorithm scientists that do not want to manage specifics of either managing tasks or vectorization. Language Extension NO template library Intel Cilk(tm)Plus The Cilk component of Cilk(tm)Plus has a keyword style syntax that will spawn threads for you to do the ‘for’ loop computation. There is no implicit vectorization just by using the keyword – it just parallelizes in-between cores. However, you do have additional options for vectorization. The Array Notations component of Cilk(tm)Plus will allow you to vectorize. Vectorization options for the Cilk component if you need a language extension but NO template library – pragmas – compiler flags – explicit vectorization – Array Notations component of Cilk(tm)Plus If you want a language extension and have allowances for template libraries you can use these for vectorization – pragmas – compiler flags – API calls from another template library that will vectorize for you – Array Notations component of Cilk(tm)Plus – explicit vectorization (writing your own AVX/SSE code) If template library vs. language extension vs. both is a non-issue for you, the issue is TIME and DEGREES of freedom. I will discuss that in the next blog post and video. Tweet

ProtectStar Data Shredder

The new ProtectStar Data Shredder provides an intelligent solution for deleting sensitive data. The compact and user-friendly software package will enable users to delete their data irreversibly from their hard drives, external drives, USB-Sticks, SD-Cards, etc. while preventing sensitive data from… [ File Shredder ]

XL Toolbox 2.81

Daniel´s XL Toolbox is a free add-in for Microsoft® Excel® for researchers in the biomedical and other sciences. Its worksheet management features also make it a useful general-purpose addin. ANOVA, smart custom error bars, spread scatter and much more. There are several commercial software packages for scientific data analysis and presentation. While their statistical and scientific charting functions are unmatched, they are themselves no match for Excel´s flexibility when it comes to organizing your data. Often, researchers end up having their data in several files: Most is stored in Excel®, and for specific analysis and presentation tasks, some of it is copied over to one of the dedicated scientific packages. If you prefer to store, comment and analyze your data in one place, you should try Daniel´s XL Toolbox. Homepage : http://xltoolbox.sourceforge.net/ Download : XL_Toolbox_2.81.exe File Size : 9.60MB

ChateX 1.17

ChateX is a new client-server all in one chat program based on speed. The program has a secure protocol which won´t allow crashings to happen as all the data is verified when it is received in the client and the server side. All connections can be monitored by admins. The program is also useful through LAN. What’s New in version 1.17: Many important bug fixes Added messages count to the system tray icon’s tooltip and to the private chat window titlebar when it is out of focus Added percentage to the file transfer window titlebar Added the file name to the Open file and Save file windows Added notification messages to the main chat window whenever a file is received or sent successfully Added URL detection. Chat links can be enabled from Options to be automatically opened in browser when clicked Private chat works by user name now instead of user ID Added notification on main chat when somebody changes name ChateX attempts to reconnect now to the chat server when losing connection. This can be disabled in Options Added a list of recently used servers on the main window Added chat logging. This can be disabled in Options Homepage : http://www.chatex.tk/ Download : ChateX%201.17%20Installer.zip File Size : 492.5KB

EaseUs Disk Copy

EaseUs Disk Copy enables you to create an exact copy/clone of your internal hard drive, by copying the data from one drive to the other, including all operating system settings, programs, data etc. This is useful when you are upgrading or replacing your hard drive as it eliminates the need to… [ Partition Manager ]

Red Data Safe

Keep all your passwords in one place safe from curious eyes. With Red Data Safe you keep all your login information clearly stated in a table and SAVED! By a master password that you set up the first program start, the program and thus your data is protected from unauthorized persons. The… [ Password Manager ]

Intel® AMT Developers, SOAP, and Migration Utilities

Way back in November of  ’10, I wrote a blog about urging Intel AMT Devleopers to stop using SOAP. The reason being that the APIs availabe in the Intel AMT Software Development kit were moving in the direction of WS-MAN and that SOAP was now phased out with Intel AMT 7.0. If you are one of those developers who has been involved in writing aplications for AMT and was also doing so via the SOAP method, you might want to sit back and take inventory of where your SOAP calls are. In my blog post, mentioned above, I tried to give some guidance on how to move forward and embrace the new and improved world of WSMAN but I did not talk about was SOAP and the Intel® SCS. Well, that’s just provisioning, right?  And that’s only done once, right? Well….  it goes deeper than the initial provisioning. If you used the Intel SCS 5.0 to provision your systems, then you will have an SQL database associated with your Intel AMT implementation.  This is where the credentials to your AMT systems are stored, among other important information. How are those systems then managed after provisioning took place? Your manageablity application must query that SQL database in order to get the credentials so that AMT can be accessed and the system can then be managed, correct?  Um… would you be using SOAP calls to query the data base? Yes you would.  If you had moved over to SCS 6.0 you probably figured all of this out and that you would need to add support for the WSMAN APIS required to access the SCS 6.0 Database (or you just didn’t move forward.) Now that the SCS 7.0 is available you may look for the SQL portion of the implementation and wonder… where is it?  It’s gone. SCS 7.0 keeps information in XML files and it is up to the developer to bring in his/her own database for storing credentials. And now you are probalby scratching your head wondering what do you do with all your system’s credentials that are stored in your SCS 5.0 database once you move to SCS 7.0? It is simple. You download the SCS 7.0 (Source Kit) and inside you will find a gift. Yes. There is a Migration Utility in there that will help you move all your system credtials to the SCS 7.0 XML formats. From there you can transfer the information into the DB that you choose. You might want to take a look at this guide: Intel®_SCS_5.x_Migration Or alternatively, you might be interested in finding out more information about the new ” Digest Master Password .”  The Intel AMT Digest Master Password (DMP) is a single password that is synchronized by the IT administrator among the various management software applications. The protocol defines a method for deriving the Intel AMT administrator password from the DMP that creates a unique password per device. Using this method, the software application does not need to maintain the password database. It simplifies using multiple applications from multiple vendors to manage the Intel AMT device.  You can read more about this in the Intel AMT® 70 Documentation . (Search for Digest Master Password.)

Development Release: Kororaa 14 Beta 3

Chris Smart has announced the availability of the third beta release of Kororaa 14, a Fedora-based desktop distribution: “Kororaa 14 (Nemo) beta 3 has been released for download, in 32-bit and 64-bit flavours with KDE and GNOME. It is recommended to back up your data and perform a….

Mail Them Pro

Mail Them Pro is a multi-threaded mass mailer and mailing lists management tool. It comes with a built-in SMTP server and mail merge functionality, allowing you to personalize the newsletter with the receivers name, email or any other data that is included in your list fields. You can use it to send… [ Newsletter ]