Algorithmic Complexity
Algorithmic Complexity is the measure of how long an algorithm will take to execute. You may recall from Computer Science coursers Big-O or you may have heard algorithms described as linear, polynomial or non-polynomial. And you may recall the non-polynomial (NP) problems are generally not solvable because they simply never complete.
Generally Algorithmic Complexity is not a huge problem for engines not concerned with optimization. Calculation of a forecast for example is a simple loop through items and calculate a forecast for each. But an algorithm designed to search through data or through combinations of actions like moves on a chess board are subject to Algorithmic Complexity issues.
Number of loops
The easiest way to visualize Algorithmic Complexity is to think about the loops in engine.
If you loop over the number of items (n) one time then the algorithm is linear. If the number of items increases the run time will increase linearly. If it takes 5 minutes to solver 1000 items then it will take 10 minutes to solve 2000 items.
If you loop over the number of items (n) one time and then for each loop you loop of the items again then this is n*n or . This is called polynomial. The run time expands by that polynomial as the data increases. If we have algorithm that takes 5 minutes for 1000 items then it will take 20 minutes; twice the data takes 4 times as long.
Most developers know this either through training or through experience. Driving algorithm design as close to linear as possible. But developers can sometimes be fooled with data structures and the operations on those data structures OR with different data elements that mask the nature of the loops.
Data Structures
Take an algorithm that must loop through a list of items and for each item look up the item in a Hash Map. What is the run time of this algorithm? A developer not thinking deeply about the complexity may tell you it’s linear. The loop through the list of items only once. But the look up in Hash Map is not free – it’s done in log n time. The run time is actually n log n – sometimes called super linear.
Super linear and linear are often the best you can achieve so the difference is purely academic but this isn’t always the case. Consider an algorithm that might require the items to be sorted. A developer might be tempted to Collections.sort(). But this sort is a Merge Sort which would be n log n. If you do this sort inside of a loop over items then you have n * n log n which is .
Load on a loop
The load on the loop is essentially the amount of work that the loop is doing. From a pure computer science standpoint this is not significant. The computer scientist is more worried about the Algorithmic Complexity or the Big-O. But the load on the loop can also be a major contributor to pain.
Consider the algorithm that loops through the items (n) one time. The algorithm is linear. But now let’s assume the algorithm makes a query inside each loop. And that query takes 1 second. If there is 1000 items then it will take 1000 seconds (16 minutes). Then let’s say we pull the query out of the loop and instead have one query for all 1000 items before the loop starts and that query takes 5 seconds reducing each step of loop to only 0.1 seconds. That will only take 105 seconds (1.75 minutes). That’s almost an order of magnitude (10x) savings just by moving the query.
Queries are obvious places for optimization. But less obvious places include string manipulation. For example, JSON strings. An Engine may have a JSON string of parameters. And each step of loop it looks through the parameters to help carry out of some logic. Each step of the loop the string is dissected and broken into parts to find the parameter values. This can lead to significant load on the loop. Debug Log Files is another area of concern. Debug logs are only written if the node is set to debugging log level – very rarely. Debug logs are also very complex with lots of concatenation and string manipulation to get meaningful debug data to the logs. If this is done deep within the loop then it can create significant load on the loop. It may be necessary – but ONLY if the logging is turned on. If you build the string and pass it to the Logging service then you are concatenating the string even if the logging is turned off. Instead, a test for the logging level should be done before the concatenation. It seems silly – but load on the loop can become expensive as the loop increases in size.