Simple Scraper Tutorials

Before You Start:

Using the Simple Scraper is easy, by design. Here's what you need to get started: 

  • ​A Windows PC with .Net 4.5 installed. If you aren't sure if it is installed, the installer file will tell you.
  • A URL that contains data or links to data you want to scrape, with a "Next" button to get to more results, or---
  • A list of URLS, linking to pages formatted the same way, that contain text or HTML or element attribute data that you want to scrape
  • I recommend setting up a VPS. You can do this for free using Amazon. Read this blog post for info.)

Here's what WON'T work in this version:

  • Pages that require logins 
  • Pages that are not structured or formatted the same way  (you can often get data from these, but it may be inconsistent). 
  • Scraping images (you CAN scrape the URLs of images, and download them with other tools)
  • Pages with infinite scrolls (Twitter and Facebook are a good example)
  • Captcha solving or proxies (unless you can use a machine-wide proxy)

What's On This Page?


Have a question or need support? Visit the support desk.

Download your free trial here.


How Does the Simple Scraper Work?


The Simple Scraper combines an automated web browser with XPath, a kind of syntax that parses XML and HTML. When you highlight parts of a website using the scraper, it uses AI to determine what the XPath is to locate the element(s) that you selected. The automated browser then loads the pages you've entered and the XPath determines the data to scrape.

Of course, you don't need to know any of that - you just need to click a few buttons and it will do the rest. It's that simple!

Oh - I've included some sample templates inside the install file. You'll find them at 
C:Program FilesOutscrape Simple ScraperTemplates.

Single-Level Navigation


This first tutorial will go through scraping a Single Level of web pages. You can also think of this as a flat, or horizontal, scraping template.  Examples of this would be scraping data from the search result pages on Google, without loading the results themselves.

>>Choose the "Single Level" Template if you want to scrape data without going inside result pages. For a better understanding of what template to choose based on what data you're looking for, watch the full tutorials at the bottom of the page.<<

1. Let's Start Scraping:

Step 1: Load the Outscrape Simple Scraper, and choose the "New Template" Button in the top right. Choose the Single-Level Template.

In the window that appears, enter a URL that contains data you want to scrape. In this example you can see a Google search results URL.

TIP: The scraper cannot enter text or click buttons, when it's running, so you need to use the URL that you get AFTER you type text or click buttons if that's necessary. Just get there in another browser, copy the URL, and paste it here. If you're trying to scrape search results, or scrape data that appears after you enter a search term or click a button, use the URL that you get. Don't use, for example, www.Google.com, if you really want the results page.

Tip: Click the images to enlarge!

Choose the Single Level Navigation Template to start..

2. Choose the Next Button (Or Arrow):

Step 2: Click the drop-down to choose a command.  Choose the "Select Next Page Button" . 

Locate the button, text, or element that you click to get to the next page. For example, on Google this is either the  word "Next" or the ">" arrow. 

RIGHT CLICK this. It will highlight red, to show that it has been selected.

TIP: If there is no next button, watch the tutorial on scraping from lists of URLS and come back after you've seen it!

Choose the "Next Button"  that advances to the next page of results.

3. Choose the Data's Region:

Step 3: Click the drop-down to choose a command.  Choose the "Select Region Around Data To Scrape" button.

Move your mouse cursor over some of the data that you want to scrape. You'll notice an orange, rectangular highlight. Right click when you highlight the correct area of the data you want to scrape. A red highlight will appear on all the samples of data that match your selection.

Your goal here is to move the mouse until you highlight all of an example of the data that you want to scrape. For example, if you want to scrape the URL and description of a Google search result, you could highlight any of the 10 results on the page, as long as the rectangle covers the data entirely. ​ If there is only a SINGLE example of the data you want to scrape on this page, you probably chose the wrong template.

Choose the region of the data you want to scrape.

TIP: You can do this highlighting more than once, so if you make a mistake, you can simply re-select the appropriate area. ​And, if you cannot highlight all of the data at once, that's fine too. Remember - you are only selecting a sample of the data you want.

4. Choose the Data:

Step 4: Click the drop-down and choose the "Select Sample Data" button. Inside the region that you selected, use your mouse to create an orange highlight around an example piece of data that you want to scrape. Right click.

In this step, you should imagine each piece of data that you want to select as a separate row in a spreadsheet. The results will appear that way. You will want to highlight individual pieces of data and right click them. 

In the window that appears, use the checkboxes to choose which pieces of data inside your selection you want to scrape. Use the "Field Name" window to change the name of your "Column" of data in the CSV. The result will be your field name DOT the attribute element that is listed beside the checkbox. (For example, Description.Inner_Text.)

Highlight the data inside your selection that you want to scrape and right click.

TIP: You can hit cancel if your selection isn't accurate. You can also go back to the previous step after choosing your Data in this step if you'd like to select more than one Region of Data.

5. Save and Run Your Template:

Step 5: Ready to run your template? Click "Save Template", name it something you will remember, and click OK. Then click Run Template and choose your saved template.

TIP: Your scraper will run until it can no longer find a "Next" button. Sometimes, the HTML code that the scraper uses to recognize the next button continues to exist after it is no longer clickable. In those instances, the scraper will need manually shut down. (You will notice it repeating the final page.)

Click save, name your template, then click run and choose it.

6. View Your Results:

Step 6: Load your results by visiting the Outscrape_Sessions folder.

The Sessions folder is located inside your Outscrape directory. It will usually be the last (newest) folder, but the folders are named according to the time and day they were created: 

Session_170630104525 = 2017/06/30/ 10:45:25

The ​file to open will be called "output_[TemplateName].csv". This file can be opened while results are still being saved, even if it appears to have 0kb. Don't edit and save the file, however, unless the scraper has stopped. 


Your results are accumulated inside the Sessions folder, in a CSV file.

Multi-Level Navigation and Scraping


This second tutorial will go through scraping a Multi-Level tier of web pages. You can also think of this as a vertical, or deep template.  Examples of this would be scraping the data from products or pages INSIDE of a search result on amazon, craigslist, or a real estate site.

>>Choose the "Multi-Level" Template if you want to scrape inside result pages. For a better understanding of what template to choose based on what data you're looking for, watch the full tutorials at the bottom of the page. <<

1. Let's Start Scraping:

Step 1: Load the Outscrape Simple Scraper, and choose the "New Template" Button in the top right. Choose the Multi-Level Template.

In the window that appears, enter a URL that contains data you want to scrape. In this example you can see a Craigslist results page URL.

TIP: The scraper cannot enter text or click buttons, when it's running, so you need to use the URL that you get AFTER you type text or click buttons if that's necessary. Just get there in another browser, copy the URL, and paste it here. If you're trying to scrape search results, or scrape data that appears after you enter a search term or click a button, use the URL that you get. Don't use, for example, www.craigslist.com, if you really want the results page.

2. Choose the Next Button (Or Arrow):

Step 2: Click the drop-down to choose a command.  Choose the "Select Next Page Button" . 

Locate the button, text, or element that you click to get to the next page. For example, on craigslist this is the word "next".

RIGHT CLICK this. It will highlight red, to show that it has been selected.

TIP: If there is no next button, watch the tutorial on scraping from lists of URLS and come back after you've seen it!

3. Choose an Example Result Link:

Step 3: Click the drop-down to choose a command.  Choose the "Select Example Page Link" button.

Move your mouse cursor over a link that goes to a page that you want to scrape data from. You'll notice an orange, rectangular highlight. Right click when you highlight the correct link. A red highlight will appear on all the links that match your selection.

Your goal here is to find a sample link that, when selected, creates a red highlight on all the links you want the scraper to visit.

Once you've done this, click to load one of the sample link pages you'd like to scrape data from.

TIP: You can do this highlighting more than once, so if you make a mistake, you can simply re-select the appropriate area. ​

4. Choose the Data:

Step 4: Choose the "Select Data" button. Use your mouse to create an orange highlight around an example piece of data that you want to scrape. Right click.

In this step, you should imagine each piece of data that you want to select as a separate row in a spreadsheet. The results will appear that way. You will want to highlight individual pieces of data and right click them. 

In the window that appears, use the checkboxes to choose which pieces of data inside your selection you want to scrape. Use the "Field Name" window to change the name of your "Column" of data in the CSV. The result will be your field name DOT the attribute element that is listed beside the checkbox. (For example, Description.Inner_Text.)

TIP: You can hit cancel if your selection isn't accurate. If you don't like the page results, for some reason - for example, if there is data on some pages that you'd like to scrape that isn't on this page, you can hit the "Back" button and load a new sample result page as long as you haven't highlighted and chosen any data.

5. Save and Run Your Template:

Step 5: Ready to run your template? Click "Save Template", name it something you will remember, and click OK. Then click Run Template and choose your saved template.

TIP: Your scraper will run until it can no longer find a "Next" button. Sometimes, the HTML code that the scraper uses to recognize the next button continues to exist after it is no longer clickable. In those instances, the scraper will need manually shut down. (You will notice it repeating the final page.)

6. View Your Results:

Step 6: Load your results by visiting the Outscrape_Sessions folder.

The Sessions folder is located inside your Outscrape directory. It will usually be the last (newest) folder, but the folders are named according to the time and day they were created: 

Session_170630104525 = 2017/06/30/ 10:45:25

The ​file to open will be called "output_[TemplateName].csv". This file can be opened while results are still being saved, even if it appears to have 0kb. Don't edit and save the file, however, unless the scraper has stopped. 


Video Tutorials and Walkthroughs


Have more questions? Watch each 10-20 minute video to get a better understanding of how the software works.

Basic Introduction:

The Simple Scraper grabs data from Craigslist and Google, two popular sites, in this quick feature demo.


In-Depth Walkthrough

Demonstration of scraping TedX (an event site), Yelp (a review site), and Slickdeals (a forum). Watch this in-depth tutorial to see more of how this Simple Scraper works.


Scraping from a List of URLs

Just load a list of URLs from a CSV while building your scraping template, and as long as all the pages are formatted the same way, you can scrape every single one of them!


Scrape Data from a Link on a Result Page (3rd Tier)

Outscrape's Simple Scraper can even scrape data 3-tiers deep. For example, you could load a series of product search pages, visit the pages for each resulting product, and from there enter the author's profile and scrape data from their page. 

Troubleshooting, Tips, and Tricks


While it is easy to use, the Simple Scraper isn't perfect ! It sometimes requires a little extra work on certain pages. This project is essentially an alpha version of the final product - I am working on much more robust version. When that is released, you will get first dibs (and a discount equal to how much you paid for this scraper).

My Virus Scanner is Reporting the Simple Scraper

Because the Simple Scraper uses an automated browser to scrape according to the template that you build, some virus scanners may report it as a potential threat. I assure you it is not a threat. This error may appear during installation, or when you first load the software.

You may need to whitelist the software to let it run. If you'd like instructions on this, type "Whitelist [your virus scanner]". 

You can find the VirusTotal report for the Setup File and the Outscrape.exe file below.

The Simple Scraper isn't Getting the Data I Expect!

This is usually caused by minor changes in the formatting of the pages, or of how the data is structured from page to page, which isn't obvious to you but makes a big difference to the Simple Scraper. Sometimes the result is that you see big sections missing in your data, or almost no data at all.

To make sure you get the correct data, follow these steps: 

  • Try selecting different samples of your data. Select different regions, on different pages, as well. A good example of an issue you might experience: Some Google result pages include videos or images at the top, especially on the first page. Try starting on the second page to limit the variations in formatting. 
  • Try multiple templates. If there is a huge change in formatting that means you can't scrape all the data with a single template, just create two separate templates! You can see an example of this in the in-depth tutorial, around 5 minutes in.
  • If you see big sections of data missing, make sure it's actually on the pages that you are using! Often times, a selection of pages will contain information that you want, but others won't. The result will be a CSV file with gaps in the data, which is to be expected - you can't scrape what isn't there!

The Scraper is Taking a Really Long Time / Uses a Lot of Memory

  • The scraper takes a long time between pages! You may have found a page that loads a lot of images or javascript. It might not be possible to scrape any faster, but there may be workarounds, such as using a different version of the site. Contact me if you're unsure.
  • The scraper loops on the last page! This might be because it doesn't recognize that the "Next Page" button no longer exists. Check the data that it's scraping - is this data repetitive, especially at the very end? If so, you may simply need to manually turn off the scraper when it's finished. Unfortunately there is no way around this.
  • The scraper can grow in size when you have extremely large websites or a long list of pages.  I HIGHLY recommend running it on a Virtual Private Server so that it does not interfere with your regular work. You can get a free Amazon Virtual Private Server. For instructions, see this blog post.

    Additional Suggestion: Make additional templates where your bot has stopped running after manual shutdown. If you find that the software moves slowly, or you check the task manager and it's using a lot of your memory, I recommend using the log file (which you can find in the Sessions folder for the appropriate template) to see what the last relevant URL visited and scraped was.

    Then, you should be able to copy the template and make a duplicate of it. Change the URL that the template starts on to the appropriate last URL, and run that new template. The URL you're looking for is in the 

    Another Suggestion (Advanced): Set a timer to end the scraping after a certain number of minutes. I've created a simple  batch script here that should help. 

    To run it, you just need to place it on your hard drive, and open your command prompt.  Enter the directory where you placed it, and type

    kill.bat 500 Outscrape.exe

    kill.bat is the name of the script. 500 is the number of seconds to wait before shutting down the Scraper. Outscrape.exe is the name of the software process you will be closing.

    You'll run the template first, and then run the kill.bat script.

I Need to Scrape Data That Is Behind A Login

Unfortunately, this isn't possible with the Simple Scraper at this time. I'm working on a solution though - update coming soon! 

I Need to Scrape Data On An Infinite Scroll Page

Pages like Twitter or Facebook that require scrolling down to load cannot be scraped with the Simple Scraper. Again, I am currently working on a way to do this. More soon!