Introduction to ETL
Simply described, extracting, transforming, and loading (ETL) is merely integration oif the data from different sources to be placed into one location to be used as knowledge to provide wisdom for good decsion making. ETL (n.d.) says that ETL is also referred to as “ELT.” The history of ELT dates back tothe 1970s, when different organizations were using different sources for their information.In the 1980s and 1990s, data warehouses grew. Due to different computers using different software for the computers or systems, much of the data were incompatible of each other (ETL, n.d.). That was why the use of ETL became so important!
What Exactly is ETL
ETL is the abbreviation for extracting data, transforming the data, and loading it (Goer et al., 2010, What is ETL, n.d.). The original process is to copy data one source to another. Extracting the data is just removing the data. During the transforming portion is when the data is cleansed and altered. Bansal (2014) states that big data gives more purpose to ETL by enabling decision making and compiling the information together. What is ETL (n.d.) states that ETL makes it more feasible to transform data into business intelligence. Goer et al. (2010) mention that without the data being properly cleaned, properly extracted, and transformed properly, the whole querying process is impossible.
ETL challenges
Vassiliadis et al. (2002) state that different ETL tools can be used for extrtacting data, cleansing the data, or customizing and putting the data into databases. Most experts state that the ETL process takes up about 60-80% of the time spent on dealing with the data (Goer et al., 2010). They add on that the original data can be from any sources, including Microsoft Excel spreadsheets, Mainframe application, a CRM base, or an ERP application. There is a list of challenges that are encountered with ETL, including: poor query performance, challenges moving the data, prolonged load times, difficulty maintaining business rules, critical data may be missing, and end users may lack the access to the business rules (Goer et al., 2010). There is a different form of information integration, called mediation, where data is queried from the original source, rather than extracted; It saves time.
Conclusion
Although there are different software that may be incompatible with other kids, therre are still methods to get that information to become compatible and useful. As previously mentioned, there are challenges that can be met though.
References
ETL (n.d). Https://sas.com
What is ETL? (n.d.) Https://talend.com/resources/what-is-ETL
Bansal, S. (2014). Towards a semantic extract-transform-load (ETL) framework for big data integration.
Gour, V., Sarangdevot, S., Tanwar, G., & Sharma, A. (2010). Improve performance of extract, transform, and load (ETL) in data warehouse. International Journal on Computer Science and Engineering, Vol. 2, No. 3, pp. 786-789
Vassilaidis, P., Simitsis, A., Skiadopoulos, S. (2002). Conceptual modeling for ETL processes. Association for Computing Machinery, 2002, pp. 14-21