Search Text using List of Keywords

How many keywords do you need to search for?  Can you say “Needle in a haystack”????

So, you are analyzing social media posts and you need to flag any posts that contain certain keywords.   The problem is your list contains hundreds of keywords!  Today, I will show you a Power Query function which will easily perform the keyword search.  

Basic Steps

  1. Create your keyword list – as a List.  Once you get your keyword list as a table, use the Convert Table to List function on the Transform ribbon to convert to data type List.
  2. Create the function

    (String) => 
    let
        //check if values in MyKeywords is in String
        MatchFound = List.Transform(List.Buffer(MyKeywords), each Text.Contains(String, _, Comparer.OrdinalIgnoreCase)), 
    
        //index position of match found
        Position = List.PositionOf(MatchFound, true),
    
        //return null if Position is negative
        Return =  if Position < 0 then null else MyKeywords{Position}        
    in
        Return

     

  3. Add a custom column to the query with the text to be searched and invoke the function. 

 And there you have it!  Use this technique to search any text across any list of words.

Active Tickets

Active Tickets

Active Items, Open Tickets, Outstanding Items – All names for basically the same calculation. In this video, we will discuss how to use DAX to enable the count of any “Active” item. In the sample data, there is only a stage created date and no stage end date, so we first create a stage expire date as a calculated column, then leverage the new column in our Active Ticket measure. Finally, we will display our Active Ticket measure in a gantt chart visualization created using a Power BI matrix.

Note: If your data has both a create and end (expire) date per line, then you can skip to the Active Ticket measure at timestamp 9:40 in the video.

Just the Code:

  1. Setup: In my sample data, each row contains a Ticket, Stage and Create Date – but no stage end or Expire Date. There is a Closed date but only for the last stage of the Ticket, not the intermediate stages.
  2. Create calculated column LowIsNew – ranking all stages within a ticket from newest to oldest
    LowIsNew =
      CALCULATE (
        COUNTROWS ( Ticket_History ),
        FILTER (
          ALL ( Ticket_History ),
          Ticket_History[Ticket] = EARLIER ( Ticket_History[Ticket] )
            && Ticket_History[Created Date] > EARLIER ( Ticket_History[Created Date] )
        )
      )    + 1
  3. Create second calculated Column RowExpiration which calculated the Created Date for the next event and sets it as the EXPIRE var (varaiable). The IF in the Return also checks if the status of the row is Closed, in which case the EXPIRE is replaced by Closed date.
    RowExpiration =
      VAR EXPIRE =
        CALCULATE (
          MIN ( Ticket_History[Created Date] ),
          FILTER (
            ALL ( Ticket_History ),
            Ticket_History[Ticket] = EARLIER ( Ticket_History[Ticket] )
              && Ticket_History[LowIsNew] = ( EARLIER ( Ticket_History[LowIsNew] ) - 1 )
          )
        )
      RETURN
        IF (
          Ticket_History[Status] = "Open",
          IF ( ISBLANK ( EXPIRE ), TODAY (), EXPIRE - 1 ),
          Ticket_History[Closed Date]
        )
    
  4. In your data model, ensure you have a date table, mine is Dates, that has a date field with contiguous values for all of the facts in the Ticket table. The key item here is DO NOT CREATE A RELATIONSHIP BETWEEN Dates AND Tickets.
  5. Finally, create Measure Active Tickets which will count the Active Tickets for each day. This measure is explained in detail starting at 9:40 in the video.
    Active Tickets = CALCULATE (
        DISTINCTCOUNT( Ticket_History[Ticket]) ,
        FILTER (
            Ticket_History,
            (
               Ticket_History[Created Date] <= MIN ( Dates[Date] )
                    && Ticket_History[Row Expiration] >= MAX ( Dates[Date] )
                   
            )
        ),
        VALUES (Ticket_History[Ticket]))
  6. In the video we transform a simple matrix of counts into a Gantt chart using conditional formatting.

And there you have it! Use this technique to calculate the counts of ANYTHING which has a begin and end date (even if it does not have an end date).

If you would like to play with the Power BI Desktop file, you can find it here.

Have Fun! Barbara

 

Fun With Flags

OK, so its not Country Flags, we are really going to play with Tags, or keywords, but I couldn’t resist this image…

Today we are going to work with a field with delimited text values which we need to separate to enable processing within the Power BI Data Model.

Welcome to my first VIDEO blog. 

In the video, we will walk through the steps in detail.  In addition, I have summarized the steps below the video.

 

For those of you who would prefer just the steps, here they are…

To create a Flags table

  1. In the Queries List, right click on the Product query (table containing flags), and select Reference (this ensure the two queries stay connected).
  2. Rename this query to Flags.
  3. Cntl click on ProductID and Flags, right click and select Remove Other Columns. This should leave this query with just the two columns.
  4. Select the Flags column, from the ribbon, select Split.
  5. Expand Advanced and select Split into Rows – each flag term will now be in its own row with the corresponding ProductID
  6. File > Close and Apply
  7. In the Relationship pane,  double click the relationship (line) between Product and Flags
  8. Update the Cross Filter Direction to Both
  9. Add the Flags field to a slicer and slice away!

To use Text Filter Custom Visual

  • In the Visuals pane, select the Ellipsis (…), then select From the Marketplace (or Store, depending on your version)
  • Search on “Text” and select the Text Filter, add it to your report
  • Add the new custom visual (icon is a spy glass) to your report canvas, and select any field with text that you want to filter.

Note: This is different from the Search on a regular slicer as it actually SELECTS all of the items that match your phrase.

Have Fun!!

Blog Reboot with Data Insight Summit 2017

DataInsight

James Philips’ keynote to kick off the 2017 Data Insight Summit in Seattle was nothing less than AMAZING!!

Coming soon to Power BI will be the ability to embed PowerApps within a PowerBI canvas, creating bookmarks which are aware of the filters on which they were set, and the ability to drill to other report while respecting filters.   In addition, they noted that Power BI Premium is now generally available.

There were SO many upcoming items… I am giddy with anticipation… A-MAZ-ING!!

 

Today’s June Power BI Desktop Update was no slouch either –

  • Data bars for new table & matrix (preview)
  • Markers on line, area, & combo charts
  • Font family setting on Visuals
  • Force zeros to align between the two axes on Combo charts
  • Horizontal Image Slicer
  • Updates which will allow more Accessible Reports

Well done.  Looking forward to working with the June release and getting my hands on the new “toys” coming in the next quarter.

Well, I am off to get ready to present “What does Raspberry Pi have to do with BBQ” on day 2 of summit, will talk more about that tomorrow.  Bye for now..

Barbara

 

 

 

 

 

Microsoft Data Insight Summit was a blast!

MSDataInsightSummit

Just wanted to drop a quick post about the Microsoft Data Insight Summit held in Seattle this last week.   This was an inaugural event which focused on all things Power BI, with both Power BI Desktop and Excel.  It was really fun to meet up with members of the PowerBI community from all over the world.  I got to see Alberto Ferrari deliver his Advance DAX class, and learn to be a Data Scientist from Danielle Dean. 🙂

During the Opening Keynote, they shared some incredible upcoming features!  My Favorite is being able to set Row Level Security on a Data Model in the service!  WOW!  This will truly be a game changer for most of my clients!  A few other great features they mentioned coming “this Spring” are

  • Drill Thru
  • Usage Metrics are coming to the Admin Center
  • Conditional Formatting for tables
  • and many more!

Looking forward to next year!

 

Power Query Refresh fails after making certain changes in PowerPivot Model

As you can imagine, I am really enjoying playing with Power Query and Power Pivot.  PowerPivot models have been around a bit longer so the software is pretty stable.  Power Query is still pretty new, so occasionally we run into some “gotchas” with the software, this post is intended to help you avoid a few of these “gotchas”.

Members of my team were happily creating PowerPivot models with the data sourced in Power Query, and out of the blue, mysterious errors kept occurring:

  • Exception from HRESULT: 0x800A03EC
  • Could not refresh table ‘Survey’ from connection ‘Power Query – Survey’.  Here’s the message we got:
    Out of line object ‘<xxx>’ referring to ID(s) ‘<xxxGUID>’ has been specified but has not been used.  The ‘<xxx>’ column does not exist in the rowset.

Upon further research and happily landing on this thread (and the brilliant Miguel Llopis) led us understand that there are some common “gotchas” which cause your Power Query to be marked Read Only (bad), and thus not refreshable.  Once you are in this state, the only way out is to remove the query from the Data Model and put it back in (see steps below for renaming a PowerPivot table).

To help keep you out of trouble, in a PowerPivot model where the data is sourced from Power Query do NOT perform any of these operations DIRECTLY in the PowerPivot model:

(If you are using the native Power Pivot data connections for ALL tables in the model, these do not apply)

 

  • Don’t Edit Table Properties – it seems like you could remove a column from the model by de-selecting it from Table Properties, but don’t do it!
    • Go back into the Power Query source query for the table and use a Remove Column function to remove the column, then reload to Power Pivot.

 

  • Do not Delete, Rename or change the Data Type of a Column in a Power Pivot table.
    • Go back into the Power Query source query for the table and use a Remove Column, Rename Column, or Change Data Type function to make the changes, then reload to Power Pivot. 

 

  • Don’t Delete a Table directly in the PowerPivot window
    • Go back into the Power Query source query for the table uncheck the box for “Load to Data Model”. If you REALLY don’t need the query anymore, you can delete the query and it will also remove it from the PowerPivot data model. 

 

  • Don’t Rename a PowerPivot table (and underlying Power Query)
    • For this one, you will have to break your PowerPivot data model. In Power Query,
      Go back into the Power Query source query for the table uncheck the box for “Load to Data Model” Apply and Close to refresh.
      Edit the query and Change the name of the query, Apply and Close to refresh (but don’t re-check the box just yet).
      Edit the query and re-check the box for “Load to Data Model” Apply and Close to refresh – this will also load the table back to the Data Model with the new name.
      Note: You will have to re-apply your relationships and also add back any calculated columns which had previously been on the PowerPivot table. This is why it is a good idea to think through your naming conventions early.

 

  • Import additional PowerPivot tables using PowerPivot Import Wizard. Even if it is a really simple query, use Power Query.
    • So if one PowerPivot table is Power Query, they all must be Power Query

 

  • Don’t try to upgrade an existing Excel 2010 Data Model built using Power Query to an Excel 2013 Data Model.
    • This one actually makes a ton of sense to me. As Excel 2010 cannot load data DIRECTLY to the Data Model, you have to use Load to Worksheet then create a linked table to get the data into the Data Model. Recreating this model directly in Excel 2013 would result in a much more streamlined model.

One way to tell if you are creating trouble for yourself is to check your Power Query data connections to see if they are still OK.

On the Data Ribbon in Excel, select Data Connections

03_03_workbookCXN

 

 

 

 

 

 

Open each Power Query data connection, and view Properties

On the Definitions tab of the Properties window, look at the connection string.  If the Connection String box is white and looks like you could edit it, then you are OK.  However if the Connection String box is greyed-out, then “Houston, you have a Problem”!

03_04_CXN_Properties_good

At this point, you will need to disconnect the Power Query from the Data Model.

Right click the Power Query and select Edit.

De-select both the Load to Worksheet and Load to Data Model check boxes.

Apply and Close the Power Query.

Right click the Power Query again and select Edit.

Re-select Load to Data Model check box (and Load to Worksheet if needed).

You will need to put back any updates to the table in the PowerPivot model, such as relationships, calculated columns, and measures.

Power Query Version at the time of posting: (2.12.3660.142)

(Due to new bits for Power Query being published essentially monthly, I will note the Power Query version # when the post was created)

Use Power Query to Calculate Distance

Distance calculations! Over the past couple of years, I have had several clients who are using PowerPivot who want to measure the distance between two points on a map: Client to distribution center or building to building.

For just a few calculations, this is very straight-forward using plain old Excel. It has all the needed trig functions to calculate the distance in miles between two points using Longitude and Latitude where they are expressed in decimal form.

The first step is expressing each Latitude/Longitude of both of the coordinates in radians:

lat1 = (lat1Decimal / 180) * PI (3.14159265359)

lon1 = (lon1Decimal / 180) * PI (3.14159265359) – repeat for 2

Finally, use this formula in Excel to derive the distance in Miles. If you prefer km, then change the ending constant to 6371.

Distance_01

You can download a file which has the formulas: The table contains the coordinates, and the radian calculations are in columns H:K, and the final distance calculation is in column M.

Distance_02

The problem we encountered was when we wanted to calculate a million records which was too many formulas for Excel to handle gracefully.

We would have liked to use PowerPivot to solve this issue, but alas, there isn’t a Sin() or Cos() function in DAX (the PowerPivot formula language), so this was not possible.

Power Query to the rescue!!

Power Query is built on a language called “M” – kind of sounds like something out of a 007 movie, huh?

Let’s create the calculation!

I downloaded the From/To coordinates into Excel and formatted the area as a Range (Home Ribbon > Format as Table > select any format)

Distance_03

On the Power Query ribbon, you will see that there are many different data sources we can use to get data. In this case, we are going to select From Table.

Distance_04

This will bring up the Power Query Window. Since I named the table “Distance” in Excel, the Name Property in Query Settings has already named the query “Distance”.

The formula bar displays the syntax for sourcing data from an Excel data Range with a range named “Distance”

=Excel.CurrentWorkbook(){[Name=”Distance”]}[Content]

Distance_05

Just like in Excel, we could put the entire formula, including the calculation of the radians, in one gigantic formula, but it is much more difficult to read. So we will do it like Excel, and perform it in steps.

The next step is to add a column to calculate each of the 4 coordinates to Radians. To do this, you will select Insert Custom Column from the Insert Ribbon.

Distance_06

Enter the formula in the formula bar:

Lat1_Rad = ([Latitude_1] / 180) * Number.PI

Distance_07

Click OK to close the editor. Repeat this process 3 more times for the remaining coordinates:

Lon1_Rad = ([Longitude_1] / 180) * Number.PI

Lat2_Rad = ([Latitude_2] / 180)    * Number.PI

Lon2_Rad = ([Longitude_2] / 180) * Number.PI

After adding all 4, your query editor should look like this. You can see the column names that you entered, and on the right Applied Steps pane, there is an entry for each action we have taken. Currently the last item is highlighted, and the corresponding formula is visible in the formula bar.

Distance_08

PQ has performed an AddColum function on the Table, and for each item, has performed the requisite calculation. Scroll thru each step to see the syntax for each. Also, you will note, as you select each item, it displays what the table looks like up to that point. This is very evident when you click on Source, as the four calculated columns disappear. Click on InsertedColumn3, and they re-appear.  This is a great tool for debugging more complex M scripts.

Finally, we add the column for distance as “Distance_Mi”. Be sure to follow the capitalization structure from the Power Query reference EXACTLY as it is currently VERY case sensitive. I hope they ease this a bit in future releases, as it messes with my brain a bit to have to remember when to cap and when to ensure functions are lower case.

Distance_Mi = Number.Acos(Number.Sin([Lat1_Rad]) * Number.Sin([Lat2_Rad]) + Number.Cos([Lat1_Rad]) * Number.Cos([Lat2_Rad]) * Number.Cos([Lon2_Rad]-[Lon1_Rad])) * 3959

Now, in our final model, would probably want to keep the Latitude and Longitude columns in order to play with the cool mapping functions in Power Map and Power View (which we will do in a subsequent post), but you no longer need the “helper calculations” for the radians.

Distance_09Highlight the four radian columns, and from the Home Ribbon, select the dropdown on Remove Columns and select Remove Columns. Note the other choice here is to Remove OTHER Columns which comes in really handy when you have a list with a ton of columns but you only need a couple of them.

Distance_10

Cool, huh? Since the previous steps are still in the list, the calculations for the removed columns are still in memory, so the distance calculation still works!!

On the bottom right, you have selections for where to load the data. If you don’t need it in Excel, you can go directly to the PowerPivot model by selecting Load to Data Model. For our purposes, we are just going to Load to Worksheet, so ensure that is clicked and select Apply and Close from the Home ribbon.

Note: In Excel 2010 the only selection you can make is Load to Worksheet.  The Load to Data Model is available in Excel 2013 and O365.

Distance_11You now have a finished distance calculation! Test it by comparing the Excel formulas from the original table!

Download the Worksheet here.

In our next post, we are going behind the scenes of this calculation and start to look at some Advanced Editing functions within “M”.

Use Power Query to Inventory your files

Welcome to GirlsWithPowerTools.com! If you found this blog expecting to find scantily clad ladies with buzz saws, boy have you come to the wrong place!

If you would like to learn more about the great suite of tools that Microsoft Power BI has to offer, then you are in the right place. The Power BI tools encompass Microsoft Excel, PowerPivot, Power Query and Power Map, all showcased in BI Sites on SharePoint. On the server side, we will also dive into SQL Server Analysis Services Tabular models (SSAS Tabular, or just Tabular)

Who am I? I am Barbara Raney, a PowerBI Architect with Slalom Consulting. I have over 20 years of practical experience with Microsoft Excel, and 10 years consulting and training my clients to more effectively use Excel.

Why start my own blog? In all of my years working with Microsoft Excel, I love to figure out all of the cool things you can do with spreadsheets and have been the “go to girl” whenever questions come up. I absolutely love it when people say that “Excel can’t do that” – only to show them – “Yes it can!!”

As I mentioned, I don’t work for Microsoft, but I do a ton of work WITH them, which has really taken my love of all things Excel to a whole new level. In this blog, we are going to go on a journey learning cool new tips, tricks and techniques using all of the “Power Tools”. And “the Girls” mentioned in the website name? They are a group of wonderful geek girls who will be helping me along the way – and hopefully guest-blogging!

Many of you already know about PowerPivot, which is a great data modeling tool, and now the new kid on the block is POWER QUERY! This wonderful little add-in will work in Excel 2010, 2013 and Office 365.

Inventory all of the files on a drive

In our first adventure, we are going to use Power Query to take inventory of all of our files and then find where there are duplicates! This is a great way to free up some space on your disk. 🙂

First, download the Power Query add-in for Excel.

If you use Excel 2013, make sure you have the latest version at Windows Update.

Open a blank worksheet and check out the Power Query ribbon:

01-PQToolbar

You can use the Get External Data chunk to load data from a ton of data sources:

02-slide

The last item in the first list, “From Folder”, is what we are going to use.

One the Get External Data chunk, in the From File dropdown, select From Folder

03-FromFolder

Select the drive (or folder) that you would like to inventory and click OK

04-FolderPath

Voila!! Now you see a list of all of the files in all of the subfolders in the path provided.

05-expand

The inventory is missing one key ingredient: file size. For a file to be a true duplicate, the Name, Date Modified and Size all need to be the same.

Expand Attributes Column and select Size, then click OK.

06-SelectSize

Next we are going to filter out any non-Excel files by applying a filter to the Extension column

07-Filter

Take a look at the Query Settings pane on the right. In the Applied Steps area, you will see the three steps we performed on our data. If you click back on the Source step, you can preview what the data looked like when you started, as you select each step, its action is applied. This is VERY handy when you are doing a particularly long transformation and you need to backtrack what you did without losing your work.

08-QuerySettings

Rename your query to Inventory, check the Load to Worksheet box, and uncheck the Load to Data Model box, and select Apply and Close from the Home ribbon.

Note: If you have Excel 2010, your only option is to Load to Worksheet.

09-WkbkQueries

Your beautiful list is now in Excel!  The right pane is the Workbook Queries pane, which can be toggled on and off from the Manage Queries chunk of the Power Query ribbon.

Find the Duplicates

We could use simple Excel formulas or conditional formatting to find our duplicates, but this is a post about Power Query, so we are going to use it to find our dups.

We need a second query which starts out just like the first, but then goes just a bit further. Even though it is short, I am going to show you a short cut to create a copy.

In Excel, right click on your query and select Duplicate. Yes, it is just that easy!

10-DuplicateQuery

Right click it again, and select Edit, as we need to take this one just a bit further.

We only need to keep in the query the columns that define an exact match. Highlight Name, Date Modified and Attribute.Size, and in the Remove Columns dropdown, select Remove Other Columns

11-Remove

Next, highlight all three columns again, and select Group By from the Transform chunk.

12-Groupby

Ensure that all three columns are in the Group By section and that the Operation is set to Count Rows, then click OK.

Now you have a count of how many times each file appears in the folder. In my case, I filtered the list to remove the ones and I have 4 files with 4 version each!

13-GroupbyResults

Now it would be nice to not have to use VLookup or something like that to find out where they live in the folder structure, so I am going to Merge this list with my full inventory to create a second list of just duplicates!

From the Combine chunk, select Merge Queries, then select holding your shift key down, select Name, Date Modified and Attribute.Size from both tables and check the box to Only include matching rows and click OK. I don’t need the whole inventory again, just the records with the duplicates.

14-Merge

Expand the NewColumn column, and select Folder Path and click OK.

15-NewCol

Double click on the NewColumn.Folder Path column and rename to Path. This is the finished query! Again, you can see each of the applied steps. If you want to run this query for a different path, simply right click on the Source step in both queries and change the path to the new one!

16-dups

Go ahead and Apply and Close to load your dups into Excel, and clean up that drive!!

17-apply

Download the file

Hope you like our first foray into Power Query – Come back next time for a cool Distance Calculation with Power Query!