The Intel® Threading Building Blocks ( Intel® TBB ) flow graph is fully supported in Intel® TBB 4.0. If you are unfamiliar with the flow graph, you can read an introduction here . Figure 1 below shows a flow graph that implements a simple feature detection application. A number of images will enter the graph and two alternative feature detection algorithms will be applied to each one. If either algorithm detects a feature of interest, the image will be stored for later inspection. In this article, I’ll describe each node used in this graph, and then provide and described a complete working implementation. Figure 1: The Intel® TBB flow graph for the feature-detection example. In the figure, there are four different type of nodes used to construct the application: a source_node , a queue_node , two join_node s, and several function_node s. Before I provide a sample implementation, I’ll provide a brief overview of each node. The first type of node is a source_node , which is shown pictorially using the symbol below. This type of node has no predecessors, and is used to generate messages that are injected into the graph. It executes a user functor (or lambda expression) to generate its output. The unfilled circle on its right side indicates that it buffers its output and that this buffer can be reserved. The source_node buffers a single item. When a buffer is reserved, a value is held for the caller until the caller either consumes or releases the value. A source_node will only invoke the user functor when there is nothing currently buffered in its single item output buffer. The second type of node is a queue_node , which is show using the figure below. A queue_node is an unbounded first-in first-out buffer. Like the source_node , its output is reservable. The third type of node, of which there are two variants used in the example, is the join_node . A join_node has multiple input ports and generates a single output tuple that contains a value received at each port. A join_node can use different policies at its input ports: queueing , reserving or tag_matching . A queueing join_node , greedily consumes all messages as they arrive and generates an output whenever it has at least 1 item at each input queue. A reserving join_node only attempts to generate a tuple when it can successfully reserve an item at each input port. If it cannot successfully reserve all inputs, it releases all of its reservations and will only try again when it receives a message from the port or ports it was previously unable to reserve. Lastly, a tag_matching join_node uses hash tables to buffer messages in its input ports. When it has received messages at each port that have matching keys, it creates an output tuple with these messages. Shown below are the symbol for the reserving and tag_matching join_node s used in Figure 1. The final node type used in this example is a function_node ; it uses the symbol shown below. A function_node executes a user-provided functor or lambda expression on incoming messages, passing the return value to its successors. A function_node can be constructed with a limited or unlimited allowable concurrency level. A function_node with unlimited concurrency creates a task to apply its functor to each message as they arrive. If a function_node has limited concurrency, it will create tasks only up to its allowed concurrency level, buffering messages at its input as necessary so that they are not dropped. To save on space, I’m going to fake the image processing parts of this example. In particular, each image will simply be an array of characters. An image that contains the character ‘A’ has a feature recognizable by algorithm A, and an image that contains the character ‘B’ has a feature recognizable by algorithm B. So in the post, I will provide the complete code to construct and execute a flow graph that has the structure shown in Figure 1, but I’ll replace the actual computations with trivial ones. Below is the declaration of struct image , as well as the trivial implementations that can be used as the bodies of the function nodes. The function get_next_image will be used by the source_node to generate images for processing. You might note that in get_next_image , every 11th image will have a feature detectable by algorithm A and every 13th image will contain a feature detectable by algorithm B. The function preprocess_image adds a simple offset to each character, and detect_with_A and detect_with_B do the trivial search for the characters ‘A’ and ‘B’, respectively. #include #include const int num_image_buffers = 100; int image_size = 10000000; struct image { const int N; char *data; image(); image( int image_number, bool a, bool b ); }; image::image() : N(image_size) { data = new char[N]; } image::image( int image_number, bool a, bool b ) : N(image_size) { data = new char[N]; memset( data, ‘
Posts Tagged ‘intel’
A feature-detection example using the Intel® Threading Building Blocks flow graph
Posted by admin on September 12th, 2011


