More and more organizations are adopting a hybrid approach to their IT environment. In the context of computing and IT infrastructure, a hybrid environment combines multiple types of resources or technologies, typically involving a mix of on-premises infrastructure, private cloud, and public cloud services. It allows organizations to leverage the benefits of both on-premises and cloud-based solutions to meet their specific requirements.
This approach particularly appeals to larger organizations with a mainframe footprint because of the rich legacy of mission-critical applications that run on mainframes. Mainframes are designed to handle large-scale, high-performance, and mission-critical workloads and excel in processing and managing vast amounts of data and transactions. So, organizations that have invested in mainframes rely on the platform’s high reliability, availability, and security.
At the same time, cloud computing offers tremendous benefits for new development of web and mobile applications on a cost-effective platform. The cloud is well-suited for many applications, particularly those that benefit from scalability, flexibility, cost-efficiency, and accessibility.
This means that organizations are keeping and extending their mainframe applications while also building out new applications using cloud services. Such an approach is called a hybrid environment or sometimes hybrid cloud computing.
Types of Mainframe Data
Data is crucial in modern development practices, enabling organizations to make informed decisions, optimize processes, and deliver better user experiences. For example, data is at the core of AI and Machine Learning (ML) development as ML models are trained on data sets to learn patterns and make predictions. And this is only one example: truly, all types of development require and produce data. Given its rich heritage, the mainframe is a phenomenal source of valuable data for all types of modern development.
But it can be challenging to access mainframe data out of context. Consider the numerous different types and formats of mainframe data that exist.
Data may be stored on many different Database Management Systems (DBMS) on the mainframe. Even though Db2 for z/OS is the leading mainframe DBMS today, several other popular DBMSes are used to power mainframe applications and store critical data, including IMS, IDMS, Adabas, and Datacom/DB. These DBMSes all store data differently and use multiple different models, including relational, network (where relationships between records are defined through sets and pointers), hierarchical (where data is organized in a tree-like structure with parent-child relationships), and others.
Mainframe data need not be stored in a DBMS, however. A lot of mainframe data is stored in flat files, also known as QSAM or physical sequential files. Flat files contain records with no structured relationships and require additional knowledge to interpret their content (for example, a COBOL copybook with a file description including fields, data types, and lengths).
Another prevalent type of mainframe data is VSAM or Virtual Sequential Access Method. This is a methodology for the indexed or sequential records processing on direct access devices. There are three ways to access data in a VSAM file: random (or direct), sequential, and skip-sequential. As with flat files, VSAM files require a file definition to access, as there is no embedded description of the data other than perhaps a key.
Another type of mainframe data that may be useful exists in log files. Log data is used with DBMSes to manage and record changing data, but it can also be used by transaction processing systems (such as CICS and IMS/TM) and other system software. The operating system, z/OS, also writes log data to multiple locations, such as the SYSLOG, the job log, the OPERLOG, the console, and more. All of these logs are formatted differently and can be difficult to interpret without additional context and documentation. Nevertheless, log data can be a valuable tool for system management and uncovering useful operational information.
We must also acknowledge that not all mainframe data must be stored on disk. Mainframes often utilize magnetic tape storage for archival purposes. Tape data only can be accessed sequentially.
Obviously, the mainframe is a rich source of data. But how can we access this data in a hybrid environment from applications that may not be running on the mainframe?
How to Access Mainframe Data
There are many different ways for hybrid applications to utilize mainframe data. The key is enabling the application to understand and access the data in a way that makes the most sense for its operations.
One approach is to use Application Programming Interfaces (APIs) or Web Services to interact with mainframe data. These APIs enable authorized access to specific mainframe functions, data repositories, or transactions. Cloud applications can use standard protocols like RESTful APIs, SOAP (Simple Object Access Protocol), or WebSphere MQ to communicate with mainframe systems and exchange data.
Another popular mechanism is deploying Middleware and Integration Platforms as intermediaries between cloud applications and mainframe systems. They provide connectors, adapters, or APIs specifically designed to interface with mainframe environments. These platforms facilitate seamless data integration, transformation, and messaging between cloud applications and mainframe data sources.
Message Queues and Event Streams also allow cloud applications to access mainframe data. MQ technologies like IBM MQ or Apache Kafka can be used to exchange messages or events with mainframe systems. Mainframe applications can publish messages to the queue or subscribe to event streams, allowing cloud applications to consume and process the data in near real-time.
It is also possible for cloud applications to connect directly to mainframe databases using the appropriate Database Connectivity protocols. For example, Db2 on z/OS supports industry-standard database connectivity options like ODBC (Open Database Connectivity), JDBC (Java Database Connectivity), and SQLJ (SQL in Java) for accessing mainframe data. Cloud applications can use these interfaces to query, update, or manipulate mainframe databases directly.
Another possibility is to move the data from the mainframe to the cloud using ETL (Extract, Transform, Load). ETL procedures can be deployed to extract data from mainframe systems, transform it into a suitable format, and load it into cloud-based data storage or analytics platforms. This approach involves extracting mainframe data through various methods such as file transfers, database queries, or APIs. The data is then transformed and loaded into cloud data warehouses, data lakes, or other storage solutions for further processing and analysis.
Yet another approach is to use Replication and Synchronization mechanisms to continuously capture changes made to mainframe data and propagate them to corresponding cloud data repositories in near real-time. Examples of this technology include IBM Data Replication or Change Data Capture (CDC). This approach enables cloud applications to work with up-to-date mainframe data without directly accessing the systems.
Finally, Virtualization and Emulation techniques may be employed to create an abstraction layer between cloud applications and mainframe systems. This involves running mainframe operating systems or applications on virtualized environments or emulators within the cloud infrastructure. Cloud applications can use standard networking protocols or APIs to interact with the emulated mainframe environment.
The chosen method depends on factors such as the nature of the mainframe data, security considerations, performance requirements, existing mainframe infrastructure, latency considerations, and compatibility with cloud technologies.
Organizations often leverage a combination of these approaches to integrate cloud applications with mainframe data, enabling modernization, data access, and seamless interoperability between mainframe and cloud environments.
NOTE: CloudFrame’s application modernization software and approaches for moving mainframe workloads utilize numerous methods to access your organization’s data. The most common data sources include Db2, VSAM, and sequential files, as these are the primary sources for batch systems.