Batch Processing - alive and kicking in modern IT

Optimize computing resource usage and automate your workloads with modern batch processing

4 min

Oct 4, 2023 from Activeeon

batch job processing

Batch processing is a method of running software tasks - called jobs - in batches automatically. In this article, you’ll learn more about batch processing, including: definition, needs and the evolution of batch processing for the modern IT environment.

What is batch processing?

Batch processing occurs when a computer automates repetitive tasks in a group. It’s a method for running high-volume, automated data jobs without user interaction.

The batch processing method is still very much alive in supporting companies and it can be performed on an as-needed basis or regularly scheduled.

Batch processing and modern IT

Often literature speaks about the challenges of batch processing as slow, not sophisticated, high latency, not scalable and resources-constrained.

All this may be true if businesses continue to use scheduler solution they bought 20 years ago.

In this document, we’ll discuss how batch processing, supported by a modern scheduler, fits into today’s IT requirements.

The needs for batch processing

As previously indicated, batch processing is still effectively used for IT operations activities, especially at night time. Yet, ITOps, DevOps, and SRE are increasingly utilizing batch processing for even more advanced requirements during working hours.

With the growth of big data and AI, and the advent of advanced schedulers such as ActiveEon, batch processing can provide awesome benefits.

The image below shows how ByDance, the company behind Tiktok, extracts a specific type of data from a large dataset and, by launching individual batches in parallel, obtain several types of data analysis (predictions) based on text, pictures or videos.

batch-processing-predictions

The key element in this architecture is the scale of both the compute power to run the algorithms and the capability to manage parallel processing to get the result as soon as possible.

Efficiency of batch processing

The size of the dataset typically varies during the day, and the IT staff would struggle to continuously adjust the necessary compute capacity in real time.

To that end, and in order to be effective, the scheduler must automatically scale (up and down) the compute resources in accordance with the needs of the batch processing.

batch-processing-risk-report

The scheduler is actively monitoring the underlying computer resources used by each batch processing and adapt the power accordingly.

Whenever the batch processing requires additional resources to process the job, the modern scheduler is able to provision extra compute power as needed.

For the most advanced needs modern schedulers should also be able to choose the appropriate resources based on the nature of the algorithm.

To demonstrate this idea, consider how a quant analyst running a Montecarlo simulation would require a lot of CPU power to crunch the data, as opposed to a marketing executive working on texts or photos, who would require compute power with a lot of RAM.

batch-processing-survey

Modern schedulers triages the batch processing toward the appropriate group of compute resources for better results.

These sophisticated schedulers are increasingly referred to as orchestrators.

Want to see for yourself?

request a demo

Batch processing and parallel processing

The second key element in the use case is parallel processing.

Several batch processes are working concurrently processing different data from a single dataset. Each parallel batch process uses a set of dedicated CPU or GPU to crunch the algorithm at the same time. Because end users may utilize software from several sources, firms frequently adopt a hybrid environment for this.

At a German Bank based in London, an example of a typical use case is:

  • batch 1 using a third-party solution set up in their private cloud;
  • batch 2 using an algorithm created on-premises by the DevOps team;
  • batch 3 using a open source solution in a public cloud.
batch-processing-performance

Batch processing and real-time events

Batch processing which is often used in regular schedule is increasingly used on as-needed basis.

There are many situations that can cause a batch job to start, such as when a customer downloads a file from your company website for analysis or when the accounting department requests an ad-hoc report that requires access to several different programs and databases.

A modern scheduler has the ability to listen for events and run a series of batch operations to complete the process.

Summary

Overall, batch processing is still very much in use and is one of the essential instruments in the arsenal of a successful IT operation manager.

Learn more about our batch scheduling software

If you want to know how advanced batch scheduling can be used for your advanced requirements, please get in touch with us so that we can show you some real use-cases of awesome batch processing jobs.

Learn more Contact us

More articles

All our articles