Use Power Query to Inventory your files
Welcome to GirlsWithPowerTools.com! If you found this blog expecting to find scantily clad ladies with buzz saws, boy have you come to the wrong place!
If you would like to learn more about the great suite of tools that Microsoft Power BI has to offer, then you are in the right place. The Power BI tools encompass Microsoft Excel, PowerPivot, Power Query and Power Map, all showcased in BI Sites on SharePoint. On the server side, we will also dive into SQL Server Analysis Services Tabular models (SSAS Tabular, or just Tabular)
Who am I? I am Barbara Raney, a PowerBI Architect with Slalom Consulting. I have over 20 years of practical experience with Microsoft Excel, and 10 years consulting and training my clients to more effectively use Excel.
Why start my own blog? In all of my years working with Microsoft Excel, I love to figure out all of the cool things you can do with spreadsheets and have been the “go to girl” whenever questions come up. I absolutely love it when people say that “Excel can’t do that” – only to show them – “Yes it can!!”
As I mentioned, I don’t work for Microsoft, but I do a ton of work WITH them, which has really taken my love of all things Excel to a whole new level. In this blog, we are going to go on a journey learning cool new tips, tricks and techniques using all of the “Power Tools”. And “the Girls” mentioned in the website name? They are a group of wonderful geek girls who will be helping me along the way – and hopefully guest-blogging!
Many of you already know about PowerPivot, which is a great data modeling tool, and now the new kid on the block is POWER QUERY! This wonderful little add-in will work in Excel 2010, 2013 and Office 365.
Inventory all of the files on a drive
In our first adventure, we are going to use Power Query to take inventory of all of our files and then find where there are duplicates! This is a great way to free up some space on your disk. 🙂
First, download the Power Query add-in for Excel.
If you use Excel 2013, make sure you have the latest version at Windows Update.
Open a blank worksheet and check out the Power Query ribbon:
You can use the Get External Data chunk to load data from a ton of data sources:
The last item in the first list, “From Folder”, is what we are going to use.
One the Get External Data chunk, in the From File dropdown, select From Folder
Select the drive (or folder) that you would like to inventory and click OK
Voila!! Now you see a list of all of the files in all of the subfolders in the path provided.
The inventory is missing one key ingredient: file size. For a file to be a true duplicate, the Name, Date Modified and Size all need to be the same.
Expand Attributes Column and select Size, then click OK.
Next we are going to filter out any non-Excel files by applying a filter to the Extension column
Take a look at the Query Settings pane on the right. In the Applied Steps area, you will see the three steps we performed on our data. If you click back on the Source step, you can preview what the data looked like when you started, as you select each step, its action is applied. This is VERY handy when you are doing a particularly long transformation and you need to backtrack what you did without losing your work.
Rename your query to Inventory, check the Load to Worksheet box, and uncheck the Load to Data Model box, and select Apply and Close from the Home ribbon.
Note: If you have Excel 2010, your only option is to Load to Worksheet.
Your beautiful list is now in Excel! The right pane is the Workbook Queries pane, which can be toggled on and off from the Manage Queries chunk of the Power Query ribbon.
Find the Duplicates
We could use simple Excel formulas or conditional formatting to find our duplicates, but this is a post about Power Query, so we are going to use it to find our dups.
We need a second query which starts out just like the first, but then goes just a bit further. Even though it is short, I am going to show you a short cut to create a copy.
In Excel, right click on your query and select Duplicate. Yes, it is just that easy!
Right click it again, and select Edit, as we need to take this one just a bit further.
We only need to keep in the query the columns that define an exact match. Highlight Name, Date Modified and Attribute.Size, and in the Remove Columns dropdown, select Remove Other Columns
Next, highlight all three columns again, and select Group By from the Transform chunk.
Ensure that all three columns are in the Group By section and that the Operation is set to Count Rows, then click OK.
Now you have a count of how many times each file appears in the folder. In my case, I filtered the list to remove the ones and I have 4 files with 4 version each!
Now it would be nice to not have to use VLookup or something like that to find out where they live in the folder structure, so I am going to Merge this list with my full inventory to create a second list of just duplicates!
From the Combine chunk, select Merge Queries, then select holding your shift key down, select Name, Date Modified and Attribute.Size from both tables and check the box to Only include matching rows and click OK. I don’t need the whole inventory again, just the records with the duplicates.
Expand the NewColumn column, and select Folder Path and click OK.
Double click on the NewColumn.Folder Path column and rename to Path. This is the finished query! Again, you can see each of the applied steps. If you want to run this query for a different path, simply right click on the Source step in both queries and change the path to the new one!
Go ahead and Apply and Close to load your dups into Excel, and clean up that drive!!
Hope you like our first foray into Power Query – Come back next time for a cool Distance Calculation with Power Query!