noun • [day-ta trans-for-may-shun] • the process of altering the structure, content, or other characteristics of data to make it usable for your needs
Overview
Very rarely does a dataset meet 100% of the needs of those using the data. There. We said it. But, just because this reality is inevitable doesn’t mean there aren’t ways to work around it.
Data transformation is the process of applying few or many changes (you decide!) to data to make it valuable to you. Some examples of the types of changes that may take place during data transformation are merging, aggregating, summarizing, filtering, enriching, splitting, joining, or removing duplicated data.
Often data transformation is a mandatory step needed for further data management tasks like data conversion and data integration. Data transformation is a key step in each of these processes because it can help shape, standardize, and overall create consistency between various datasets. Whether or not you need to convert data into a new file format in addition to transforming it is dependent on what the needs of your organization are. In many cases, transformation without conversion will suffice. Ultimately, data transformation will help you move data into its target destination efficiently and effectively.
More and more businesses and organizations rely on data transformation to handle the tremendous amounts of data being generated from emerging technologies and new applications. By being able to transform data, organizations not only maximize the value of their data but can manage their data in simplistic ways and reduce the dreaded feeling of information overload.
Data Transformation Steps
There are five basic steps involved in data transformation that are important to know whether you are creating, implementing, or making use of the transformation workflow. These steps are necessary to consider no matter how simple or complex the data transformation will be. By following this rough guideline, you’ll be able to properly plan and process the data to achieve your data goals.
1. Data Discovery and Data Profiling
Interpret and make sense of the exact data you are working with (so you can turn what you have into what you want).
Note the detailed information contained within the data such as the attributes, structure, and what it is you need to transform knowing the file extension is not enough!)
Here are some example questions you may ask yourself:
On structure: Is my data tabular, a raster (pixelated), or three dimensional?
On attributes: Is there additional metadata? What are the column headers describing? Is there any data missing?
On transformation: What units are measurements recorded in, and do I want to change them? Is all the data recorded in a consistent way?
Identify if the data requires any cleaning before transformation.
2. Data Mapping
Establish a well thought out plan that identifies what elements of the data will be transformed and how that will happen.
If you are transforming data for compatibility reasons (so your chosen applications can access data), determine what parts of the data should be changed and what needs to be left as is.
Ensure that your plan considers whether or not data will be lost during transformation and how to mitigate losses if needed.
3. Create a Workflow
Decide if you will transform data by writing a script or by using a data transformation tool. Consider some of these questions:
What is the current expertise on the team and are there any gaps to fill to successfully complete our data transformation project?
Will the structure of my data and transformation requirements change over time? Will I be able to easily update the workflow to meet these changing needs?
Do I need to consider if others will be using the workflow I create? Do I need to make it easy for others to understand in case I’m not around?
FME is an example of an easy to use visual data transformation tool
Identify your input and output data file formats within your workflow.
Identify the needs of the data transformation (compatibility, enriching, etc.) and ensure these requirements are met within your workflow.
4. Run Your Workflow
Connect the input data to your workflow and test what you’ve created. When you run your workflow, it will rebuild the data to match your target format. The workflow is essentially a data restructuring process.
Running the workflow should result in your old data being presented in a new way. Whether or not the data is converted into a new file format is up to you. For example:
A JSON file with new keys and values added to each object
A CSV file with dates updated to be recorded in a consistent way
A Shapefile’s attributes as a KML file with only key landmarks
5. Review the Data
Review the quality and accuracy of the output data.
Create a list or audit of issues if necessary.
Based on your findings, if needed, review the workflow you’ve created, make changes, and try again.
Data Transformation with FME
One of the simplest ways to transform data is via data integration software platforms like FME that specialize in data transformation. FME takes away the need for writing scripts so anyone, no matter their technical background, can easily create and perform their own data transformation workflows.
Transformers are FME’s standard data transformation tools that are used to modify data any way you’d like. You can think of transformers as packaged actions, functions, or pre-written code snippets. There is a variety of transformers for you to choose from, and you can add them to your workflow in any logical order that you’d like so that data is transformed exactly for your needs.
If you’re a developer, no need to worry, FME isn’t here to replace you. Just like no single person can ever know everything, no single software will ever be able to do everything. That’s why you can insert your own pieces of code, such as Python, R or JavaScript, directly into a workflow so together you and FME can build something great. Now, instead of writing an entire data transformation script you can create workflows quickly and simply, giving you more time to work on more important tasks.
Overall, whether you’re a developer or not, FME’s capabilities and built-in transformers give you the flexibility and choice to customize and extend your workflow however you want.
Here’s How it Works
Transformers are used directly in FME Form. It’s easy for you to add transformers to your workflow to create your own custom data transformation process. Here’s how:
Add the transformer by typing its name anywhere within the workspace
Drag-and-drop your transformer where you want it
Connect transformers together using the input and output ports to link your workflow
Each transformer has parameters that you have control over to give you the flexibility to transform data exactly how you want to. If you’re new to data transformation, not to worry. There’s documentation to help you understand how the parameters work. Before you know it, your new visual workflow will be complete!
Now that you have a completed data transformation workflow, you can easily re-use parts of or all of your workflow for additional data transformation tasks. To make it easier for future you and others to understand your logic behind how you created your original workflow, make use of the Annotations and Bookmark features to add notes directly into your workspace.
That’s all there is to it! With your own creative skills and data expertise, working with FME can get you where you need to go.
FME is recognized as the data integration platform with the best support for spatial data worldwide. However, it can handle much more than just spatial data and is easily used by IT and business professionals. FME has a range of supportive data transformation tools called transformers that make is easy to integrate over 450 formats and applications. With FME you have the flexibility to transform and integrate exactly the way you want to.
Safe Software, the makers of FME, are leaders in the technology world that strive to stay one step ahead of data integration trends. FME is continuously upgraded to ensure it has been adapted to support new data formats, updated versions of existing data formats, and large amounts of data. Gone is the idea that individual departments must work in their data silos, with IT structures limiting the company’s potential to truly work as one. Data should be able to flow freely no matter where, when, or how it’s needed.