Tags

, , , , , , , ,

Background

Using Microsoft PRISM framework makes modularity an easy architectural approach for App development.  But as the number of modules grow along with their internals’ complexity, it becomes time expansive and time consuming to perform heterogeneous operations; some of which must cross modules boundary such as search requests.

Introduction

Search is a fundamental part of modern applications and it will serves us appropriately to stay inline with the title of this article.

Often time, a module may itself be composed of multiple parts that are loosely couple by way of MEF or Unity.  Some parts may be available in the module initial release while new parts may be added to the module future releases.  Each part may represent a single searchable entity or share one or more storage entities with other parts.  The main challenges we addressed were how to present a potentially searchable entity(cache) as being searchable without manually creating search behaviors || search traversal paths for every pattern possible including parent-child linked relationships.  Published, a searchable entity must presents itself as a node with required meta-data to either a Search Provider or a Search Manager, thus allowing the system to perform basic search queries on the entity as well as joints in traversing nodes to reach the  proper node.  Moreover, the framework was build around the structures provided by the .NET framework such as MEF and PRISM

Example: nodes A, B, C

Node “A” has properties: xL, xM, xN

Node “B” has properties: yL,yM, yN, yO

Node “C” has properties zL, zM, zN, zP

Scenario I:

The client sends Request “L = ###” to the search provider, which delegates the search to the individual  search nodes having : Property L , and its L property having value “###”.  In this scenario, the client is not searching for a single entity but rather for multiple entities satisfying the query.  For the Application/(  framework client) to easily process the replies originating from heterogeneous nodes uniformly,  a vector/List of SearchResults type is used to aggregate the responses.  A SearchResult should be ready for presentation; that is well formatted for view.  This type of search is comprehensible from an app user standpoint where a user just types in strings to locate or a label, operator, then strings to locate.

Scenario II:

The Application composes and sends Request “Manager, FirstName = Larry” which translates, in context, to: “B, yN = ####”. Find all B objects satisying the expression yn = ###.  In this scenario, the provider locates the appropriate SearchNode, and forwards it the expression and awaits enumerables Bs that satisfy the query.

Scenario III:

the Application composes and sends Request: “Manager, Order, Date >= 12-22-82” || “Manager, Order.Date >= 12-22-82”.  Either Format states the same request; which is basically find all manager with the orders having Date property  and the Date property having a value greater or equal to “12-22-82”.  In this scenario, the provider must perform a joint across SearchNode (s) to satisfy the query.  As an example, imagine the need to traverse more than one SearchNode to satisfy the query because the Manager and Order caches do not have any direct joining endpoints.  In the simplest form possible, thus eliminating multiple levels of hierarchical traversal/indirection, the following sketch is to serve as a proof of concept.

The SearchProvider must intelligently provision a well structured query with the appropriate jointsSleeping half-moon from its list of imported SearchNode (s) that returns an enumerable of Manager that satisfy the query.  The SearchProvider’s  query must account for the optimal path from Manager to Order || vice versa

Design Phase

Bits and bytes.  Binary strings are the simplest and most compact form (storage efficient) for storing data .  For that reason, the SearchProvider will use binary strings to identify and classify the SearchNode.

Lets start off by using 8 bytes to classify our search nodes by level of hierarchy.  SearchNode(s) not having or declaring any link to another search node, will be classified as being at level I , while SearchNode(s) declaring a Level I SearchNode(s) as parent will be @ the next level from that of its parent; that is level II and so on…

By using 8 bytes, we can have 8 different levels

example:

So, how is a SearchNode level identified?   Well, during registration as part of its meta, a SearchNode declares its name, queryable properties and endpoints linking it to a parent node.  The SearchProvider uses that information to locate a SearchNode immediate parents, registers the SearchNode with a unique identifying byte Strings (more on that later) and then places it on the next levels available after the different parents.   In the above scheme, observe node P.—> how do we identify a node and its properties using a single byte?    By using the first 3 bits to identify a Node, we can have 2^3 –1 ((2 raise to power of 3) -1) Nodes per level if the nodes are to have a unique id irregardless of the parent or 7  child nodes per parent if the child identifier is composed of its parent identifier.  Again, your need would dictate which approach you choose per search provider, module, or application.  The remaining bits 2^5 –1 ((2 raise to power of 5) -1) = 31 will serve to identify the queryable properties of a node.  Feel free to partition your bits according to your taste.  Lets assume the search provider registered the aforementioned Nodes as follow:

If you look close attentively, you’ll notice that the identifier in itself provides the path between linked nodes, from which, you can devise the proper query that joins adjacent nodes.

In the implementing the search framework, I  relied on Domain Specific Language to remove all burden from an application developer viewport.  In a future blog, I’ll demonstrate a simple memoizing backtracker.

Advertisements