Enqueue

Platform uses a single GridTask of type PollAndEnqueueFiles to poll the filesystem for new files to be enqueued into a MessageQueue. This task will automatically be created on startup as long as at least one node has the following TaskPerformer registered. (This is generally already setup for you in the default dvce-app-config.xml file for Platform.)

<TaskPerformerConfig>
<Type>PollAndEnqueueFiles</Type>
<NumberOfThreads>1</NumberOfThreads>
<TaskPerformerClassName>com.transcendsys.platform.integ.fileqagent.PollAndEnqueueFilesTaskPerformer</TaskPerformerClassName>
</TaskPerformerConfig>

Increasing the NumberOfThreads will have no effect for this task since only one IDLE task ever exists at one time. (At the conclusion of a successful PollAndEnqueueFiles, it automatically creates a new IDLE task.) However, you may want to have multiple nodes which register this TaskPerformer so that if one node’s hardware fails, another node can take over polling.

PollAndEnqueueFilesTaskPerformer is governed by the presence of MessageSource and MessageSourcePoll records. These models dictate where we should go to Poll for files. For example, the following MessageSource would poll files from the C:/inbox directory and enqueue them into the inbox/MasterData queue if the file name ends with "Enterprise.csv":

<MessageSource>
<ValueChainId>9123</ValueChainId>
<Name>FileInbox</Name>
<SourceType>File</SourceType>
<Config>{ srcDir: 'C:/inbox' }</Config>
</MessageSource>
<MessageSourcePoll>
<ValueChainId>9123</ValueChainId>
<Name>Enterprises</Name>
<MessageSourceName>FileInbox</MessageSourceName>
<GroupName>1</GroupName>
<Precedence>1</Precedence>
<IncludeExpr>.*Enterprise\.csv</IncludeExpr>
<InboundQueueName>inbox/MasterData</InboundQueueName>
<InboundInterface>ZBKS.EnterpriseLoad</InboundInterface>
<InboundInterfaceVersion>1.0</InboundInterfaceVersion>
</MessageSourcePoll>

There is also a bit of configuration around polling in the instance config (InstanceConfig.xml). Specifically, it will reference the PollingIntervalInSeconds to be used when polling. (The default value is 10 seconds)

Polling is done first by MessageSource; within a MessageSource, different MessageSourcePolls are then grouped by GroupName. Within a given GroupName, if the enqueue of messages fails for some reason, then any subsequent MessageSourcePolls for that same group will be skipped. MessageSourcePolls without a GroupName will be evaluated separately, without dependency.

Each time a file is encountered, a new Message will be enqueued if:

  • IncludeExpr is not provided or is provided and that regular expression matches the file name

  • ExcludeExpr is not provided or is provided and that regular expression does not match the file name

Caveats and Best Practices

A common problem with file polling is the issue of picking up a file while it is still copying. To solve this problem, we strongly recommend that clients transfer the file under one extension, and then rename the file immediately after transfer. For example, for a file orders.csv, transfer it as orders.csv and then rename to orders.csv.READY when complete. The FileEnqueueConfig could then be configured as follows to ensure files in transfer are not enqueued:

  •     
    IncludeExpr: .*\.READY    
    

There is one additional caveat with regard to the order in which messages are enqueued. Within each FileEnqueueConfigGroup, the files will be enqueued as ordered by timestamp, and files with the same timestamp will be further ordered by file name. On some file systems, e.g. ext3 on Linux, the finest granularity of timestamp supported on files is 1 second. As a result, if you are using such a filesystem and you need messages to be processed in the order received and multiple may arrive within a second, you should use a file naming convention which will further help disambiguate. For example, the client dropping the file should use a timestamp prefix on each file dropped.

Debugging

The most common reason that a file won't be picked up by the poller is a problem with the regular expression match. You can enable DEBUG logging on this by opening any Java class in studio, right-clicking on its classname in the Outline view, and choosing Logging > Shortcuts > File Enqueue/Dequeue > DEBUG.

Please note in particular that the expression is an exact match against the absolute pathname. If your expression is site\.csv , it will not match something dropped as /inbox/site.csv , whereas the pattern .*site\.csv would match.

Additionally, please note that the file enqueue mechanism will ignore any file whose absolute path and timestamp match a file already enqueued. This means you must touch a file if you are trying to drop it for reprocessing.