Data Clean-Up using Google Sheets Functions
In November 2020, TechSoup invited me to talk about our methods of data-clean-up with Google Sheets Functions for their TechTips Digest event. The list of all recordings from this event is now available on the Netsquared blog.
Before we bulk-import thousands of records into a CiviCRM, review the data mostly giving in a CSV (comma-separated-values) file. We systematically go through the formatting issues and separate clustered data, like First and Last name or all address information in one cell.
For this talk, I collected the various use cases for which we use Google Sheets functions to clean up the data before we start the importing processes.
We share slidedeck, recording and transcript in this post.
If you are interested in receiving information about our course on Google Sheets, leave your name and email address in below form, please.
Slidedeck
Video Recording
Transcript is available below.
More Resources
- Google Maps: How to make a map from a spreadsheet and embed it on a web page
- Google Sheets function list
Google Sheets Functions Course
Take a deep dive into data clean-up and Map data and other data processing with Google Sheets
In this talk, we went through the various Google Sheet functions very fast and only for one use case. Data-cleanup. In our course on Google Sheets, we will show you different functions for different use cases, not only for cleaning up data for your CRM imports. You can use Google Sheets data for your interactive maps, resources for you #nocode website. The beta version of this course will start in January 2021. We invited you to be part of the beta version free of charge. If you are interested, please leave our name and email address, and we will notify you when we start our first cohort.
Yes, I am interested in your Google Sheets Course
Table of Content + Transcript
- Google Sheets Functions Course
- Speaker introduction: Birgit Pauli-Haack
- 10 use cases for data manipulation with Google Sheets
- 1 Couple’s first names in one field. Split(C3, ” and”, FALSE)
- 2 Multiple Address formats. PROPER() function
- 3 Address – all in one cell. SPLIT(AC2 ,”,”,TRUE)
- 4 Split State + Zip code
- 5 Split too much, aahhh. Two-word States. JOIN(” “,AI38:AJ38)
- 6 Convert “FL” to “Florida”. SWITCH(AI5, “FL”, “Florida”, “GA”, “Georgia”, AI5)
- 7 Multiple Zip Code columns. =IFS(AH37 >0, AH37, ISNUMBER(AJ37), AJ37,AK37>0, AK37)
- 8 Review final data – Hide / unhide columns
- 9 Username/ Password for dozens of people. CONCATENATE(LEFT(D2, 1),LEFT(F2,6)) / RANDBETWEEN(11111111, 99999999)
- 10 Bulk-copy cells
- Question and Answers
- Does Google Sheets have a special paste function?
- How to automatically eliminate or hide empty cells?
- Can you turn state names and just standard abbreviations?
- What’s the difference between the switch function and Find and Replace?
- Can you get rid of the useless columns?
- Five-minute demo to create a Google Map
- Other resources for learning the anatomy of each formula
Speaker introduction: Birgit Pauli-Haack
Joyce Bettencourt: Today, we have Birgit Pauli-Haack, and she is going to present on Data clean-up using Google Sheets. Birgit has been working with nonprofit organizations on database handling and website development and design and tech training since 1998. And she’s the owner of Pauli Systems. And with her team of designers, developers, sysadmins, and database professionals, she helps nonprofits to reach out to donors and members and other supporters through an optimized digital presence and provides integration between websites and CRMs. So with that, I will turn it over to Birgit.
Birgit Pauli-Haack: Well, thank you so much, Joyce. It’s wonderful to be here and thank you for attending this session. We don’t have a lot of time today, so I’m going to share my screen, and we dive right in. And what you also see is that on the slide deck, you see on every page, the link to the online version of it so feel free to check it out afterwards. Questions, please till the end. And now let’s go and tackle the Google Sheets and your data clean-up.
Data clean-up with Google Sheets is definitely a skill to learn, and it comes quite handy in a lot of data manipulations situations, not only just the data clean-up, but every time you’re using CSV files and want a hand do something else with it, you could use Google Sheets for add. So some of the things that I show you is equivalent to Excel.
10 use cases for data manipulation with Google Sheets
So it’s not a wasted skill. You can use it in any of them. It might just be a little of a different word or different function call or something like that. But you can look that up later. Okay. So, what will you learn today? I have for you 10 use cases for data manipulation with Google Sheets, and we talk about split, proper, join, switch, Ifs (), concatenate (), left () randbetween () and how to hide columns and how to bulk copy formulas. The first one is this data and most of the data comes from a real life example or real life examples that we deal with. And I picked some out. So the first case is that you get the CSV file, you load it up and see you in Google Sheet. And you see that the first name is actually has four couples. The first name of both couples in the one field and Joe CRM doesn’t like that very much helps you with it gives you all kinds of different problems.
1 Couple’s first names in one field. Split(C3, ” and”, FALSE)
So, first we learn about the SPLIT() function, and this is definitely one of the most used functions. You have a text and when it says text, it means it’s the cell that you’re pointing at, like the C3 and the example. And we have the delimiter of those two things, and then the delimiters and with a space in between. And because we don’t want to cut it up after the end, not on the after the end or the [inaudible 00:03:08] . We just want to kind of after the whole word. So, and what you do is you need two more columns because all of a sudden you have two fields, additional, you have the first name and the spouse’s name. So you add two columns and then you create, put that formula into the field of, and then it will split out the spouse name next to it. So that’s fairly easy and it gets you at least somewhere to get the first name cleaned-up.
2 Multiple Address formats. PROPER() function
In the same data set, we also found the second use cases, Multiple Address formats right now. And just ignore that this all in one field right now, we began to talk about it later, but you have small letters, we have all CAPs, we have different spellings. And for me, that needs to be cleaned-up either in one consistent way. And I introduced it to my favorite function that is proper gives you proper case. It’s actually title case that it does. It will return every first letter of each word will be uppercase, all the other letters will be lowercase. The relative function related to that function is also LOWER() and UPPER().
If you are not like me, that you don’t like uppercase, you actually favor it, you would say, okay, if you have mixed case, you would just wrap it up in an upper function and then have another column Luther right formatting. So you do you get the principle is that you have an empty column. You have a column that needs to be changed. You open up another column and then you apply the formula and it makes a copy. But it’s the right copy. Yeah, like here, the number, the address and number seven is now title case in number six in row six, Tallahassee, all CAPs is now also a title case. So that’s another set of that. So split and proper.
3 Address – all in one cell. SPLIT(AC2 ,”,”,TRUE)
Now let’s deal with the all-in-one address. And you already learned one of the one function what’s called split. Let’s try and apply it here too. So, we need additional four columns and that the, for the address for the city, for the state and for the zip code, and they’ll all seem to be the limited by a comma.
So that will be our delimiter, where to split and that’s how the function lift is split. And then AC two, that’s this column, AC two is the second row and then use the delimiter of comma and then do it for all occurrences. And that gives us the address line here, the city, yeah. In one column.
4 Split State + Zip code
And then we also have the state and zip code and you see that as some additional things to do, because for some of the addresses, Iowa and the Zip code, we’re not separated by a comma, so it didn’t catch it. And we can do a second pass at it and split up the zip code. Again, you need to do two additional columns and then you do another split, the split AG two, that’s this one here and then divide by not by comma, but by space.
And then you get it into columns again. Yeah.
5 Split too much, aahhh. Two-word States. JOIN(” “,AI38:AJ38)
Well, when we did the split for the state and the zip code, we didn’t provide for a two word state and we have a lot of them. You have North Carolina, South Carolina, and New York, New Jersey, they’re all two word States. So you want to kind of do another, you want to do the opposite of it, but you need another column for it, the opposite of it and combine those two. And that is the join function and concatenates the elements of one or more one dimensional list, Arrays equals list using specifying the limiter. And that looks like this. We joined the two columns again and put a space in between. Yeah. So we, the two columns, AG to AJ AI to AJ and then a space in between. And then we have the state name and that’s the end of the state part, but we also saw film what you see and you see it on this screen as well is we have three columns with different zip codes.
Some of them are empty. Some of them are things in there. Then we have the combination, but let’s first figure out something else we saw here. Yeah.
6 Convert “FL” to “Florida”. SWITCH(AI5, “FL”, “Florida”, “GA”, “Georgia”, AI5)
That is Florida and GA instead of Florida and Georgia, let’s do that. And I introduce you to the SWITCH () function, which means tests an expression and expression is a cell that you can point to against the list of cases and returns the corresponding value of the first matching case. So for this case, we have, if there’s FL in the cell, turn it into Florida. If there’s GA in the cell, turn it to Georgia. And so you have another mother in that same column, you can put that in there because it will function here for everyone, even the ones that, because it only goes to this column down. So now you have all the States in full form there.
7 Multiple Zip Code columns. =IFS(AH37 >0, AH37, ISNUMBER(AJ37), AJ37,AK37>0, AK37)
So what you have already mentioned is that we have three columns with zip codes and one column AJ actually has words and numbers. So let’s tackle that and you guessed it. We will add another column and then combine this with an Ifs () function. So the Ifs () function, evaluates multiple conditions and returns a value that corresponds with the first true condition. It looks more complicated than it actually is. So we evaluate each zip code column and then do something with it. So the formulas Orland is also column AI column. And it looks AH that’s, this one, if it’s larger or greater than zero, use what’s in the column. Otherwise ignore. If AJ 37, that’s the one where we have the second word of the two words States in there from our previous split. But if it’s a zip code or if it’s numbers, then use that number here.
Yeah. And then if the AK is 37 or AK is 37 here, is greater than zero, use that. And that gives us the full zip code for all the entries. If we copy that over to all that we kind of summarize that in a different way. So, and now you’ve to find a review. You want to see an eye, so you’d be not getting confused.
8 Review final data – Hide / unhide columns
And I don’t get confused. I highlight the column with a different color. That is actually the one that I want to use for my data input and to review it, I’m I hide the columns and then so I can see it. So because you have a multi-step kind of function for one thing to the next, do not delete those columns because some formulas depend on something in that column or content in that column. So you can hide them.
And then if you want to print stuff, print them out or something like that, or make a copy of it or something like that. You can unhide them and hide them. I also, so when you import into a CRM, you use the full spreadsheet as a CSV file, and you will map out the columns with the data fields in your CRM. But if you have multiple zip code fields in the header, you don’t know which one do you want to use. So I give the one that I’m going to use one particular header. And I write that down here, it’s the all zip and the state field will be populated from the state name column. The address line one and the city, that’s all clear, but for the city states, we have multiple columns of each. We were ready to import that.
9 Username/ Password for dozens of people. CONCATENATE(LEFT(D2, 1),LEFT(F2,6)) / RANDBETWEEN(11111111, 99999999)
And now I have another use case for you. See, if you have now maybe 60 or something, people who want to be part of a crowdfunding campaign, and you want to your system needs username and password for each of them, and you want to send it to them. So they don’t have to mess around with things right there in an email. So you could use the mail merge extension to Google drive. But first you need to create a username from the data as well as a password. And we used two more function. And this also shows you that you can Daisy chain some of the functions. So for the username, we take the first letter of the first name and the first six letters of the last name. So, and then combine it to one. The combined end with one is happening with the CONCATENATE function.
And the first letter of the first name is a left function, cannot left first name one is the number of character. And the password we just do random numbers that all created through that Randbetween function. Okay, let’s see how that looks. So for the first one it’s concatenate left D2 is where my first name is. That’s a little further over there, and we do one letter of that and then reduce the Left F2 is the last name. Really? We use the left six letters, how you do this is relatively. You can do the first three letters of the first name at the three letters of the last name, how you make some action is yeah, it’s up to you, but it gives you something like that. The username there, and for the password we use Randbetween.
And because I want an eight number eighty to password, I say, give me a number between 1,000,011, 1 million, 100,000, well, eighty ones and between eighty ones and eighty nines. So I don’t have to, and it creates a password for them, or, yeah.
10 Bulk-copy cells
So, and now how would you, one last feature would be to see how you can then bulk-copy all the cells into the other roads? Yeah. So when you hover over the little red square, not red loses, how about over the little red square when you mark a cell and then pull it down to all the whole row and it automatically replicates the formula and you can fill it up not to 20, you can do it to a thousand to 400 to 16,000. I hope I haven’t tried that dead 16,000. That’s my next, the next goal. What you don’t see on the screenshot is that there is, when you hover over the little blue square there, it shows you a very faint cross, and then that appears you can drag it down and then kind of do that. Yeah. So how am I looking for time? I’m almost on the time. Yes. Anything else? Well, the, as I said, nptp dot US slash pauli dash techtips is a slide deck where I can go and let’s do Q and A
Question and Answers
Joyce Bettencourt: Quick bootcamp and a Google Sheets. That was fantastic. So if any of you have questions, you’re welcome. If any of you have questions, you can put those in chat. I know, obviously this is a, these presentations are pretty quick. Bobby has a question. Does Google Sheets have a special paste function? Or you can copy and paste it with values and not formulas?
Does Google Sheets have a special paste function?
Birgit Pauli-Haack: Yes, yes. It has plenty of paste functions. One of them I like real much is when you paste it from another sheet, you have a decision to take the values that you have right now at this time, this moment in time, or you can paste it from the other spreadsheet and it updates this spreadsheet when the other one updates. So it’s kind of, you have to data connected that really also works actually in Google documents. So if you have a monthly report that you do, and every month, those the graphic kind of changes, you can make the document that reports about it, just copy and paste the graph and link it to the spreadsheet. And that would work as well.
Joyce Bettencourt: Great. And we have another one. Can you turn, Oh, actually, do you have tips on how to automatically eliminate or hide empty cells?
How to automatically eliminate or hide empty cells?
Birgit Pauli-Haack: Empty cells? Well, if you have empty rows, you can hide rows. I think, yes, you can hide rows and you can hide columns. You cannot hide empty cells because then it kind of, it moves right when you, so if you have in a row, in a column top 10 and top 20, you cannot eliminate the 15 in between or the 10, the five in between, because yeah. You would delete either the rows of it at the whole rows, and then it doesn’t work anymore. So yeah.
Can you turn state names and just standard abbreviations?
Joyce Bettencourt: Nina asks, can you turn state names and just standard abbreviations?
Birgit Pauli-Haack: Yes. There is no function for that, but you can certainly do that switch exactly the other way around. I would not type it in yourself. Yeah. You could probably get it from another spreadsheet kind of thing. Yeah. And then concatenate
() that and into the formula, but it gets a little complicated. Yeah. Most of the time your CRM is actually doing that. If you see your CRMs connected with the USPs API, then no matter what you do on the address in the import thing, it will normally also convert it to a standard, a USP as standardized address in your CRM. So yeah. I don’t know how much time you want to spend on that.
What’s the difference between the switch function and Find and Replace?
Joyce Bettencourt: Yeah. Yeah. And kind of following on that switch, what’s the difference between doing the switch function and doing a find and replace?
Birgit Pauli-Haack: Well, you can do numbers to the well, you need to do multiple steps on the switch or you can only search for FL and then replace all Florida. Yeah. You can only do kind of all find all the GAs and then two. So, it’s a multistep process when you do it and find & replace because unit, you can say, find FL or GA and replace it. So that’s why that is so interesting to do the switch for it.
Can you get rid of the useless columns?
Joyce Bettencourt: And let’s say Jane asks, if you doing concatenating a spreadsheet, can you get rid of the useless columns, turning result into usable columns and values and then delete the columns you don’t need. Not sure she’s clear with that, but…
Birgit Pauli-Haack: Well, you can do this. I wouldn’t recommend it because it’s much more work than you would want your chest height, the column. Yeah. If you need it for presentation or something like that, but you always have the original data there. I’m more like, not the leading thing so much.
Joyce Bettencourt: Okay. Any other questions? We probably have like room for what or so more. And I don’t know if there are any other while we’re waiting for questions, if there are any other tips that fall along this that you want to mention to Jane. So sorry. Birgit. Yeah.
Five-minute demo to create a Google Map
Birgit Pauli-Haack: And so I did, it might be four months ago. Eli also had a second has a kind of a speed dating thing where I did a five-minute Google Maps. Yeah. And that also use it Google Sheets, where you can put in all the addresses and then have them show up as a pin on the map. If you go to Netsquared dot org on the blog, you will see that part as well. It’s yeah. It was another multi-speaker event and it was really cool. And I actually do live demonstration, which I didn’t do here because I knew better.
Joyce Bettencourt: Yeah. Less tech problems. And I don’t know if there are any other final questions. Certainly I put the link in chat and I’ll grab that again here too, for folks.
Birgit Pauli-Haack: Yeah. Well, we are preparing a course for Google Sheets and all kinds of different things and yeah, that will start in January. If you are up for beta testing that just leave your name and email address, and we’ll let you know searching for things. So if you’re looking for that.
Other resources for learning the anatomy of each formula
Joyce Bettencourt: Yeah. And on that link, there was a form at the bottom that you can just fill in with your name and email at the end, diverse lies that document, the links that I just shared. And Charlie was asking if you had any other resources for learning the anatomy of each formula, the how to know what to put in each part. So, yeah,
Birgit Pauli-Haack: Right. There are some, the Google documentation is fantastic. So in some slides with the formula where that has the best expression of it, there’s a link to the Google documentation. And there you can read up on about each formula as well as seeing what’s related formula and also a list of quite a few formulas there.
Joyce Bettencourt: That’s great.
Birgit Pauli-Haack: Yeah. So, Marlene asked to recommend using Google Docs or is something else better? Well, it depends on, on your skill set and on how much you want to share things. Google Docs is really good at sharing things, collaborating on things and not losing things. When somebody overwrites stuff, I also Google for non-profits is free through TechSoup. So, that’s definitely a plus on that. And I really, especially when you have remote teams, something like Google Drive or Google Docs is definitely better than switching for different files from yeah. From Excel that you, yeah. You can only do version control them a bit to know which is the latest one when you’ve worked with a team.