DAY 4: Repurposing Data Versus Remanufacturing Data

Medha Mankala
5 min readAug 19, 2020

Creating reusable datasets using Excel:

This article is fourth one in the series. we are going to learn about repurposing and remanufacturing of data.

Repurposing data:

if you have a good set of data, then you want to repurpose that data, and not have to rebuild it every time there’s a change to the request. If you have verified data, and it’s been vetted through your team, then you want to be sure you keep it in a way you can reuse it without having to start back over. Also, for historical purposes, it’s important to keep up with the sets. This is vital for all projects. But, really critical depending on how you access, or receive the data. You might only receive a spreadsheet, or have limited access to the databases. You may even have to translate from paper to data sets. .Every project may be a little bit different in how you approach repurposing the files. But, simple techniques can give you proof, and the ability to not have to start over with new requests.

Understanding Source Data:

Source data is where the data is initiated and where it lives. It’s most likely to be accurate in that particular place. It’s important to understand that data does live in a lot of different places. Source data comes from systems and the data structure rarely changes.you need to determine the best source for the data and where it lives. You always want to build reports from your sources if you can.you won’t always have access directly to the source data but when you do, it can lead to more accurate results.

Creating reusable data:

In addition to creating calculations you may find that reusable data can benefit you greatly. Let’s take a look at an example. We’ll work with dates. If you found that every time you worked with your data you were having to create multiple date formulas, you may find that creating a spreadsheet that you can import into your database to use a table of dates, will be more helpful than having to repetitively build those formulas. Let’s take a look at how we can use this. I’ll go to my database. I’ll go to external data. I’ll bring in an Excel file. I’ll browse to my exercise files. I’ll bring in my table of dates. I’ll go ahead and click okay. The first row does contain column headings. I’ll choose next. I’ll keep all the fields just as they are. I’ll choose next. I don’t need a primary key. I’ll choose next and then I’ll choose finish. I’ll go ahead and close my export. Now I have a table of dates so any time I need to work with these dates I can link them to other dates. Let me show an example. I’ll go to create. I’ll go to query design. I’ll bring in my table of users. I’ll bring in my table of dates. I’ll go ahead and choose close. I’ll bring in the date of migration as my example here. That date of migration is actually stored as text. You’ll find this a lot in data systems when it comes to dates. I want to be able to work with dates though so we already know if we want to calculate dates we have to have dates. So I’ve created my table of dates. I can actually link my date of migration to my final text match that’s appropriate for this list and then I can bring in my official date stamp. If I need other information like day, month and year, instead of rebuilding those calculations, I just join the table one time and I have a reusable data set. Let’s run it. Dates are infamous in systems as being stored as text. You never know until you start working with them exactly what type of date work you might need to do.

Building data sets to filter data:

You can filter data in many ways, and you may find that you’re doing the same filters over and over again. You can build data sets with queries that actually act as filters. Let’s take a look. We’re going to build a query on our users to identify our managers, and we’re going to work with our business roles as well. I’ll go to create, query design, and bring in the users. I need a list of managers, so that when I build my reports, I can build reports that are manager related reports or just all staff or maybe all staff that’s not managers. So the easiest way to accomplish this is to create a user list of managers. I’ll double click my user ID, and I’ll go ahead and grab their first and last name. Based on our business roles, we know that all managers are identified with a user ID of greater than 5,000, so I’ll use the criteria for that. So if their user ID is greater than 5,000, then we know that they’re identified as a manager. I’ll go ahead and go to my design tab, and run my query. I have my list of managers. I’ll go ahead and save this query as managers. So once we’ve identified the list through a query, we can then add it to other queries. Let’s take a look at that. I’ll go to create. I’ll go to query design. I’ll go to my queries tab. We’ll use the Ability to Find Data Query, and we’ll also bring in our managers. I’ll choose close. I’ll grab the user ID, and I’ll join it to the user ID in managers. And based on the rules of joins, it’s an interjoin. I’ll be able to see all the video information for just my managers. I’ll go ahead and double click the asterisk, and it’ll bring all my fields in. I’ll go to the design tab, and run my query. Now this query is a filtered list of all of the same information, but now it’s manager specific. And all I had to do was bring in my list of managers. Any report I want to write and filter it by just managers, I simply need to bring in the managers query, and it’ll automatically filter based on the joins. Queries are an active part of analysis, and we can use them to refine and display results. Now you know you can use them to filter.

--

--