Input and Output (I/O)
I/O is the major performance hurdle for any process within ONE. Engines are no different. Generally, I/O for engines are limited to the OLTP database or logs. A few engines do directly work with the file system for inbound/outbound data with external processes.
Logs
All engines must write some form of logs. Careful selection of what is logged is paramount.
Generally, the highest-level log should record when the engine was started, when a task was started, the parameters the engine was run with, the parameters for a particular task, and when the task was complete, and how much data was operated on (number of forecast records generated). In other words, as many as 2 + 2 * tasks number of rows written in the database. Nice and concise.
More detailed log levels will include more debug-level information. This will require more complex string manipulation. Care should be taken such that the manipulation is only done if the log is set to that level.
OLTP Database
Interaction with the database is the most common performance issue for engines.
Architecture
The engine should be architected in a Read, Process, Write pattern, where the Read is separated from the Process and Write separated from the Process. This ensures the Read and Write logic can be optimized for a single read and a single write. The nature of the problem, typically volume, may require this to be amended. But the concept should not be completely scrapped – the read and write needs to be in a controlled limited process – not once per each step of the loop.
Queries should be defined in SQL Def. The SQL Def allows for extensibility and modification so that a custom query can be put in if a custom query can perform better than a generic variation.
Select
Model Reads are very expensive. Generally speaking, an engine has privilege over the entire database but should limit itself to the appropriate VC and Enterprise. This VC and Enterprise are identified by the user running it. So, there is no need to rely on platform and the Model Read to also limit the engine.
Reads done by select statement must be optimized.
Oracle Enterprise Manager
OEM can identify the long-running queries. If poor engine performance is being caused by a poorly performing query, then it will show up here. Details on how to use OEM are beyond the scope of this document. But an understanding of this or a similar tool is key to finding poorly performing queries.
Explain Plan
Explain Plans describe how Oracle parses and executes a query. It identifies the joins, the indexes used by those joins, the cardinality (rough estimate of the amount of data), the cost (rough estimate for time). Details on how to read the explain plan are beyond the scope of this document. But a full understanding of this is key to optimizing queries.
Max Number of rows returned by Select
Platform blocks the reading of any data from a select statement to 50,000 rows. Any query in excess of 50,000 rows will result in an error. In the event that the engine needs to read more than 50,000 rows, special care must be taken to page the read. So the reading can be done in blocks. It’s important that the paging be done in a well-thought-out way in order to get all the rows but not put too much strain on the database and system. For example, one record places the select statement in a loop and does the select for every buffer in the system, placing 100,000 queries on the database. Instead, the solution is to read about 50,000 and then another 50,000. Oracle supports rownum for such things.
Indexes Along Foreign Keys
When writing a complex query with many joins, it’s important to understand that the fields being joined must be indexed. For example, if we join buffer and item and buffer has a foreign key to an item then the item ID on the buffer table should be indexed.
Stats for Indexes
Proper indexes must be created but the index in Oracle may not be properly initialized. After a large data load where one million items are loaded – the index (a tree) will be extremely unbalanced (resembling a list rather than a tree). Stats will allow the index to be reorganized allowing the log in search times rather than constant time.
Joins and their order,
The order of joins matters. Oracle is generally very intelligent in how it orders the joins to cut down the amount of data. Imagine a query the number of students that attend one or more of professor Bob’s classes. If we join all the students to all the classes they take, this could result in millions of rows. If we join the classes that professor Bob teaches, then we might be talking about five classes. When we join those classes with the students,, we only focus on a much smaller subset of students and classes.
You can experiment with rearranging the order of the Inner Joins in the From Clause and review with the Explain Plan. What you are looking for is a reduction in the cardinality.
Hints
Oracle is very smart in the indexes it chooses to use. Sometimes it makes a poor decision that is logical in a general sense, but the specific use case it’s suboptimal. Hints can be added to queries to force Oracle’s hand.
Note that hints are sometimes specific to a specific database instance. What is good for one client is not good for another. Hints should therefore be avoided as a first-step solution. But rather Extensibility should be built into the engine and allow an instance to override the query with one with the hint.
Assistance from DBA
At ONE we expect our developers to be SQL experts, but sometimes developers lack the tools or training to make the difficult calls regarding the configuration of Oracle to some of the specific choices concerning indexes and other features. For example, an index may be partitioned to gain even more performance benefit. You won’t find this suggestion in an explain plan.
Filter and Access Predicates
An interesting process that developers often forget about is how indexes and data actually work. Indexes are a separate table of data from the table data. If you use an index to find a row, Oracle actually finds a row ID and then looks up the data from the data table with that row ID. This is an extra step that only takes a microsecond, but if we are talking millions of rows, then it’s more than Oracle can do in a single step. Then it has to page as it looks back and forth between the index table and the data table. This can be painful if the data table is required for part of the Where clause. Imagine if I have a table of cars that are indexed by license plate. Let's say I want the green cars with a license plate that starts with an A. I can find all the cars that start with A, but Oracle has to swap back and forth to find the color of the car in order to reduce the list to only the green cars. If the color was part of the index, then this swapping is not required.
Access and filter predicates are outside the scope of this document. Read more about them and understand what the explain plan is trying to tell you. Adding an extra field to the index can be a huge savings for queries that have to return a ton of rows.
Inserts and Updates
Model Inserts and Updates are very expensive. It takes time to settle the foreign keys, security, as well as the workflows and callbacks. If this is not required, then writing SQL Def to be used by SQL Service is optimal. Better than a SQL Service is a prepared statement that gets executed against the JDBC connection. When inserting 1000s of rows, the prepared statement can save a prodigious amount of time and overhead off the Oracle server.
Merge Into is an alternative for the engine to an Insert or Update. Basically, it runs a Select statement to find if the row exists if it exists then it updates else it inserts. This can be done with a prepared statement. The Select portion has to obviously be optimized, so care needs to be taken about indexing, joins, etc.