We either write or use an app to go to the websites we want it to and make a copy of the specific things we want from those websites. It’s much more precise than downloading an entire website. Like any tool, web scraping can be used for good or evil. Some of the better reasons for scraping websites would be ranking it in a search engine based on its content, price comparison shopping, or monitoring stock market information. You might even use it as a research tool of sorts.
How Can I Scrape Websites with Excel?
Believe it or not, Excel has had the ability to extract data from websites for a long time, at least since Excel 2003. It’s just that web scraping is something most people don’t think of, let alone think of using a spreadsheet program to do the job. But it’s surprisingly easy, and powerful. Let’s learn how it’s done by making a collection of Microsoft Office keyboard shortcuts.
Find the Sites You Want to Scrape
The first thing we’re going to do is find the specific web pages from which we want to get information. Let’s go to the source and search at https://support.office.com/. We’re going to use the search term “frequently used shortcuts”. We can make it more specific by using the name of the specific app, like Outlook, Excel, Word, and so on. It may be a good idea to bookmark the results page so we can easily get back there. Click on the search result, “Keyboard shortcuts in Excel for Windows”. Once on that page, find the list of Excel versions and click on Newer Versions. Now we’re working with the latest and greatest. We could go back to our search results page and open the results for all the other Office apps in their own tabs and bookmark them. It’s a good idea, even for this exercise. This is where most people would stop in collecting Office shortcuts, but not us. We’re going to put them in Excel so we can do whatever we want with them, whenever we want.
Open Excel and Scrape
Open Excel and start a new workbook. Save the workbook as Office Shortcuts. If you have OneDrive, save it there so the AutoSave feature will work. Once the workbook is saved, click on the Data tab. In the ribbon of the Data tab, click on From Web. The From Web wizard window will open. This is where we put the web address or URL of the website from which we want to scrape data. Switch to your web browser and copy the URL. Paste the URL in to the URL field of the From Web wizard. We could choose to use this in Basic or Advanced mode. The Advanced mode gives us a lot more options on how to access the data from the website. For this exercise, we only need Basic mode. Click OK. Excel will now attempt to connect to the website. This may take a few seconds. We’ll see a progress window, if it does. The Navigator window will open, and we’ll see a list of tables from the website on the left. When we select one, we’ll see a table preview on the right. Let’s select the Frequently used shortcuts table. We can click on the Web View tab to see the actual website, if we need to look around for the table we want. When we find it, we can click on it and it will be selected for import. Now, we click on the Load button at the bottom of this window. There are other options we could choose, which are more complex and beyond the scope of doing our first scrape. Just be aware that they are there. Excel’s web scraping capabilities are very powerful. The web table will be loaded in Excel after a few seconds. We’ll see the data on the left, where the number 1 is in the picture below. Number 2 highlights the Query used to get the data from the website. When we have multiple queries in a work book, this is where we select the one we need to use. Notice that the data comes into the spreadsheet as an Excel table. It’s already set up for us to be able to filter or sort the data. We can repeat this process for all the other web pages that have the Office shortcuts that we want for Outlook, Word, Access, PowerPoint, and any other Office app.
Keeping Scraped Data Current in Excel
As a bonus for you, we’re going to learn how to keep our scraped data fresh in Excel. This is a great way to illustrate just how powerful Excel is for data scraping. Even with this, we’re only doing the most basic scraping that Excel can do. For this example, let’s use a stock information web page like https://www.cnbc.com/stocks/. Go through what we did before and copy and paste the new URL from the address bar. You’ll get to the Navigator window and see the tables available. Let’s select the Major U.S. Stock Indices. Once the data is scraped we’ll see the following spreadsheet. On the right, we see the query for Major U.S. Stock Indexes. Select that so it is highlighted. Make sure we’re in the Table Tools tab and in the Design area. Then click on the down arrow under Refresh. Then click on Connection Properties. In the Query Properties window, under the Usage tab, we can control how this information refreshes. We can set a specific time period to refresh, or to refresh when we open the workbook the next time, or to refresh in the background, or any combination of these. Once we choose what we need, click on OK to close the window and continue. That’s it! Now you can track stock prices, sports scores, or any other data that changes frequently from an Excel spreadsheet. If you’re good with Excel equations and functions, you can do almost anything you want with the data. Maybe try to identify stock trends, run a fantasy sports pool at work, or maybe just keep track of the weather. Who knows? Your imagination and the data available on the Internet, are the only limits.