When working with data in Power BI you often find the need to create a new column, which can be done from the transform data label “add column”. Once in there you choose which way to create the column is more useful in your case, to serve as an example I will be using the custom column option.
After selecting our option, a window will appear to let you enter whichever formula you want to use for your new custom column. Here you can choose the name and the data of the new column, but there is no button to choose the type so, what do we do?
Well, since there seems to be no other option, we click OK and we will find the new column does not have a proper type. We could add it on a new step, but that’s not what we want. Instead, we are going to specify the desired type on our formula bar as shown:
With this your data type should be successfully changed.
Data must be treated in a clear, organized way that makes it easy to create relationships between different data tables, or directly to make the visuals that use our data to show information.
In order to give a good structure to our tables, we must identify its primary key. Doing so will allow us a way to clearly refer to every row of data in our table, making it easy to create relationships with other tables using our table’s primary key as a foreign key in the new table to connect them.
This can be obvious in cases that have clearly identified data (for example a list of people with its identifier, even if two people have the same name, surname, and income, each one of them will have its own unique ID so you can tell they are not the same person). But in some other cases, it may get a bit tricky.
I have created fake data to create an example of this tricky situation. Specifically, a cinema data of the expected income per week and film on one table and the track of the tickets sold in one concrete month. Please notice that the data may not be realistic, but that is not our current concern. The power BI data would look like this:
We’d probably have more tables (for example one that relates the movie ID and session with the price of the ticket), but we only need this two for the current explanation.
Imagine that with this data, the client asks for a way to see whether the expected income for each movie is being obtained in each week, you will need to use both columns (movie ID and week) to identify the expected income on one table, and you’d need to do the same to count how many tickets were sold in that week for each film (with which you could, later on, relate the tickets to their price). It is easy to notice that we’ll be working with composite keys.
Identify primary and foreign keys
In our “Expected Income” table the primary key would be made of the movie ID and the Week since there’s only one expected income per movie each week.
In the “Month sessions track” table the ticket ID would be the primary key since it is unique for each row of data, but we cannot use it to create the connection we are looking for. Knowing this we look for a key that we can use to create a connection with the “expected income” table.
The first thing that is clear is that we are looking for a one-to-many relationship, where the table with the income has only one row for each pair of Movie ID and Week but the Month sales track will have multiple instances for each pair. To make it more visual, I’ve made a representative schema with just one pair (of course both tables have more fields as shown with the previous images, but just so you get the idea):
In some working environments, this kind of relationship is detected by the app or web, and you are given the option to create the relationship between these two tables making use of this composite primary key, but power BI is not one of these environments.
Power BI procedure
For creating a relationship
In order to properly work with this table’s primary key in power BI, we must create a new column combining all the columns that compose the primary key. This can be done by adding a custom column with the concatenate operator or by adding a column from examples.
If you choose the custom column option, the expression for the new column should look like the following.
There are two crucial details to notice here:
You must transform the number into text in order to use the concatenate operator, if you try to use it directly with a number Power BI will complain with errors.
The “- “does not add any information but will act as a delimiter for any human who works with this data to easily understand where one field begins and the other ends. I used – but you can use the one you prefer.
Then you should go to the other table and apply the same procedure.
Once done, you have a column that acts as a primary key for the “Expected Income” table and the same column acting as a foreign key in “Month sessions track “. This allows you to tell Power BI to create the one-to-many relationship that we know exist between both tables. To do so close and apply the changes you’ve made in power query editor and get into the power BI’s model environment, add a relationship connecting the new fabricated columns and you are done.
Your new relationship should look like this:
For merging the queries
Depending on your situation you may want to merge both queries into one instead of creating a relationship between them. If this is your case, then power BI does allow you to merge two queries by as many fields as you want.
To do so the first thing is clicking over the merge queries option on power query editor (I will merge them as a new query, but you can choose to do a normal merge if it suits you).
After clicking a helper window will appear, once there you select the tables you want to merge and then select as many columns of them as you need by holding “ctrl” while you click over them. Once the fields are selected, choose the kind of join that better suits your objective and click OK.
After clicking OK your new merged query will be ready for you to modify or visualize as it pleases you.
When managing data on Power Query Editor we load it from our sources and modify it as we need to please our clients, but sometimes we get errors in the process or need someone to give us some advice, which is a lot easier if our queries are organized in a way that they are easier to go through. I will explain how I recommend organizing your queries to have a better working environment that allows you to find things faster.
A group is what Power Query Editor offers you to have your different queries stored in an organized way, you could say that they are similar to a folder. When you first upload your data and click over transform data, the Power Query Editor will start with just the query containing the source of data you have just uploaded.
As it is shown in the picture, there are no groups of any kind, just your query with the data you got from your source. This seems completely fine now, but when you have over 5, 10, 20, or more queries and parameters it becomes messy:
Even with the representative names that I tried to give to my queries, my environment looks unorganized. By grouping those queries, it’s more intuitive where to look for concrete data, either for further development or debugging:
As you can see, every sheet is grouped depending on whether it has been modified or not, which source path it came from, and so on.
You could freely choose more or less grouping as you find it easier to locate yourself in the environment while working but creating at least two or three groups is definitely going to help your work look more organized.
Also, you can hide the queries you are not interested in working with by clicking over the black arrow, which makes the environment more compact even if you have a million queries (which is very extreme, but you get the idea).
How to create, delete and ungroup groups
To create a group, you have to right-click over the grey space in your Query tab, then press “new group”, after doing so, fill up the information of its name (the description is optional but can help you remember what the data in the folder was for):
After creating your first group, it will be empty, and all the queries will be stored in an automatically generated query. To get the queries you want into your group, just click over them and drag them to your group. After doing so, you will obtain something like:
This automatically generated group is a feature of Power BI, and you cannot do anything about it. You are able to rename, ungroup and delete every group created by yourself, but none of the above actions can be performed towards the automatically generated group called “Other Queries”.
When working with any other group, to delete it will erase the folder and all the queries it contained, so be very careful about the groups you decide to delete. To ungroup on the other hand will make the folder disappear, but the queries will not disappear with it.
Apart from that, you can also create new groups inside a group if you find it fit.
To either delete, ungroup, rename, add a new group or query, and many other things, you should only right-click on the group you want to perform over as shown in the next image:
As you can see, there are multiple actions you can perform, which are intuitive enough as to not have to explain what they are for.
The most confusing one may be “Properties…” since it will only show you a pop up with the name and description the group was given when created.
Which groups should you create
Of course, this is up to you and the standards of the organizations you are working with, but I will give you the pattern I recommend.
In my opinion, you should always keep your sources just as they came in and make all the changes to them into a query duplicated or referenced to it. The reason for it is that if you notice you are missing data or your query is giving an error, you can check whether it was you who accidentally deleted a necessary column or row, or on the other hand, such information wasn’t included on the query from the beginning.
Therefore, I recommend having at least two groups, one named Sources to store the queries that are just as they were loaded, and a second one named Transformations where you can find the queries that have steps applied to them.
Apart from those two, sometimes you will need a third group to store parameters, this is so because parameters can be referred to either your sources or the transformations in your queries, so if you don’t create an extra group for them and have them mixed with the queries they relate to, it can end up getting messy, which we want to avoid at all costs.
This is the base structure that I’d recommend to everyone, leaving the other Queries table either empty or filled with queries created by yourself (meaning they do have steps but do not come from any source in your Sources group).
A source path is what gives you access to the place where data is stored. This means that when you move your data without uploading the source paths of the tables that referred to it, these tables will no longer be able to find their data, producing an error.
Similarly, when you find yourself working in groups, there is a chance each member of the group stores data in a different place with different access rights, which will also generate errors for the one member who does not have the credentials and has the data stored elsewhere.
Another situation is when you are moving a Power BI file from a development to a production environment. The source in the development environment will be different from the source in the production environment.
To solve those inconveniences there is the possibility of creating parameters that contain a list of the different source paths where data is kept, in a way that each member will only have to select its own source path to be able to access its data.
The procedure for doing so is explained in a restricted way in the video linked below. (By a “restricted way”, I specifically mean that it is thought for when you are working alone but have two different computers so the source path of the files you extract from your computer will change each time, which implies similar paths)
However, when the paths are not so similar or you are accessing a folder on the net, you have got to make things different than in the tutorial, because data extracted from a web (like SharePoint), is accessed in power BI via its domain, and nothing else, which means you won’t be able to specify the folder or file you want to access when first accessing the data.
Creating source path parameters
Firstly, we will need to upload the data from wherever it is stored, to give an example I will use data in https://dataxintelligence.sharepoint.com/ sites/AllDataX, specifically accessing the file “DataX Workplan 2022.xlsx”.
Since the data that we want to access is in a SharePoint Folder, we upload it by using power’s BI option to get Data from a SharePoint Folder. Here you can enter the link to SharePoint as text or as a parameter if you already have it. In this case, we are yet to create our parameter, so we upload with the link as text.
Remember that the link cannot be the whole URL or power BI will give an error, the provided link must refer only to the domain, without entering any folder or concrete file:
Once we enter the URL we click on “Transform data”
And therefore, a new tab in power Query will appear, here is where we can create our parameters and add them to our path before applying any changes to the data. Since the site your colleague gets data from, and the site you get it from, are not similar at all, we will replace the whole source path that is hardcoded in the source of our data (if you want you can let the https:// since that is compulsory, I will add that to the parameter for simplicity)
In order to do this, we create our first parameter by clicking on the “new parameter” option inside “Manage Parameters”
Another small window will pop up so that we can customize our parameter at will. Since we are willing to change the source path between ours and the others we select text for the type (links are considered text) and a list of values for suggested values. After selecting this, we fill up the possible values with the different URLs we want to have access to:
Once we have all the URLs we click OK and we can see our SourcePath parameter in power Query
Once we have it, we go to our source step in the query we were loading and substitute the link with the parameter
By doing this we are now able to change the SharePoint site we are getting data from by selecting it in our parameter. This is quite all right, but we are not done yet, we have to access the specific document we want from the site. As said before, for the example I am going to access “DataX Workplan 2022.xlsx”, we do so by applying a filter (in my case I filter so that the column “name” equals the name of my file, but everyone will filter as it suits best for the occasion). After applying the filter, we click on the “Binary” word that is written in the “content” column and power Query will add some steps to allow you to see the file’s content as shown:
We must be careful with those created intermediate steps because they most likely contain a hardcoded path to the file, which would produce an error each time you changed the parameter because the source path and the path of those steps are not synchronized, one will change but the other will remain the same.
To solve this, you have to first talk with your colleague and check which folder structure each one of you uses to store the file, with that you can locate which part of the path will have to change so that everyone has access.
In our example, the path I use to get the file is:
Paying close attention to it we can see that from “https “ to “/AllDataX” we have the text that is selected in the parameter SourcePath we have already created, and the rest of the link is the relative path from AllDataX to the folder that contains the file. My relative path is “/Documentos compartidos/General/” but my colleague´s path can be something completely different like “/Work Docs/General/”, “/Documentos compartidos/Work00001/”, or others increasingly different.
I will assume that in this case, my colleague’s relative path is “/Documentos compartidos/Trabajo/Horario/” which means that we differ in that where I access “General” they access “Trabajo/Horario”, so that is the part of the path we will create a new parameter for, just in the same way that we created the first one (but with a different name).
When you have both parameters, it’s time to insert them into the hardcoded path so your query will not break when changing your source, pay close attention to which step you have to modify:
In our current step we can see our relative path, but it is not really a relative path, it represents the name of the step before the one we are in, which means this is not the path we want to modify.
This previous step is the one that contains the actual path to the file, the one we want to add parameters to. Knowing this, we click on the step and we should find this:
The path is written after #”Folder Path”=, and we will insert the parameters we created by writing them in as shown:
You can see that the parameters we added are called SourcePath and RelativePath_forWorkplan, but they can be called as you please, just give them names that will help you remember what those parameters are for. If you have modified this path and clicked enter, and no error warning is showing, congratulations, you are done.
If you do get an error like the following, please check that the path is correctly spelled:
A last piece of advice
You can add these parameters before or after having your queries modified, but I recommend you do it before so that you start with the source connected the right way and do not have to worry about it anymore.