DATA ENGINEERING & DATA WAREHOUSING





Data Engineering is an other name for data architecture. The work of a data engineer is to gather the information, stores the info and do the batch processing and the real time processing & serves it through an API to the data scientist who can easily interpret it.

There are a lot of big data tools in the market which performs every step & it is necessary that the option of using a particular tool can be protected.

The good data engineer is the one who has extensive knowledge about the databases and great engineering works. They consist of handling and logging errors, monitoring the system, building human-fault-tolerant pipelines and understanding the things necessary to scale up, address the continuous integration, knowledge of database administration, maintaining data cleanup & ensuring a deterministic pipeline.

A data engineer may be programming an iterative machine which is learning algorithm to run over a Spark cluster. Even though these tracks are separate in our program, some companies prefer that the candidates are comfortable with aspects from both the data science and data engineering.

A data warehouse is also defined as an enterprise data warehouse that is a system which is used for the reporting and info analysis. They store the current and historical data and are used for producing the analytical reports for knowledge workers throughout the enterprise. The examples of the reports can range from yearly and quarterly comparisons and trends to complete daily sales analysis.

The data which is stored in the warehouse can be uploaded from the operational systems (like marketing, sales, etc.). The data can pass through an operational info store for some additional operations before it can be used in the data warehousing for reporting.

The various categories of data warehousing are:

Offline operational data warehouse: The data warehouses in this stage of evolution can be updated on a regular time cycle from operational systems & thus the data can be stored in an integrated reporting oriented data.

Offline data warehouse: The data warehouses at this stage are updated from data in the operational systems on daily basis & data warehouse data is stored in the data structure designed to facilitate reporting.

On time data warehouse: The online integrated data warehousing represents the real time data warehouses stage data in the warehouse is updated for each & every transaction performed on the source data.

Integrated data warehouse: These data warehouses assemble data from a different fields of business so that the users can look up the information that they want across the other systems.