Getting Started With Outscrape

Basic Commands Reference

If you're in Outscrape, you'll see these commands in the first panel on the left. Press "Scrape New Project" to see the next set of commands (below).

  • Scrape New project
  • Open a saved project
  • Open a ticket

Use this to build a new scraping template. Then choose "A series of web pages." Additional scraping features, like scraping images, are on the way.

After using this command, you will need to enter the URL of the site you want to scrape on the next panel and press "Let's get started".

This URL is often the first page of search results, the category of a directory website, a login page, or a page that requires entering a search term and pressing a button to load the next page.  

  • Select data to scrape
  • Select next page button
  • Select load more button
  • select urls to visit
  • select button to click
  • Select place to input text

Use this command in conjunction with "Select Place to Input Text" to login to pages, or enter text into a field and press a button like "Search" or "Go."
This command can be used on its own, as well, if you have to press a button to begin scraping, for any reason.

This button will only be pushed *once*.

Before You Start:

  • Find a URL that contains data or links to data you want to scrape, or
  • Put together a list of URLS, linking to pages formatted the same way, that contain text or HTML or element attribute data that you want to scrape

Here's what MIGHT cause you problems:

  • Pages that are not structured or formatted the same way  (you can often get data from these, but it may be inconsistent). 
  • Scraping images. (you CAN scrape the URLs of images, and download them with other tools).
  • Captcha solving. Outscrape doesn't handle this (yet).

What's On This Page?

Have a question or need support? Visit the support desk.

Download your free trial here.

How Does Outscrape Work?

Outscrape combines an automated web browser with XPath, a kind of syntax that parses XML and HTML. When you highlight parts of a website using the scraper, it uses AI to determine what the XPath is to locate the element(s) that you selected. The automated browser then loads the pages you've entered and the XPath determines the data to scrape.

Of course, you don't need to know any of that - you just need to click a few buttons and it will do the rest. It's that simple!

I've included some sample templates for you to play with.

You can find them here.

Scraping a series of similar pages

This first tutorial will go through scraping results from Craigslist. You can also think of this as a flat, or horizontal, scraping template.  Similar examples of this would be scraping data from the search result pages on Google, without loading the results themselves, or in this case, scraping data from Craigslist search results without visiting the individual results pages themselves.

1. Let's Start Scraping:

Step 1: Click the "Scrape a New Project" button.

Then click "a series of web pages." In the window that appears, enter a URL that contains data you want to scrape. In this example you can see a craigslist search page.

TIP: This tutorial will also show how to use the buttons to enter text into a search field !

Show walkthrough of Scrape New Project

Tip: Click the images to enlarge!

2. Choose the Next Button (Or Arrow):

Step 2:  Choose the "Select Next Page Button" . 

I like to do this first so I don't forget it, but the order doesn't matter. Locate the button, text, or element that you click to get to the next page. For example, on Google this is either the  word "Next" or the ">" arrow. 

Click this. It should highlight, to show that it has been selected. 

Show walkthrough of Select Next Page Button

TIP: If there is no next button, watch the tutorial on scraping from lists of URLS, or the tutorial on scraping infinite scroll websites!

Your scraper will run until it can no longer find a "Next" button. Sometimes, the HTML code that the scraper uses to recognize the next button continues to exist after it is no longer clickable. In those instances, the scraper will need manually shut down. (You will notice it repeating the final page.)

3. Choose the Data's Region:

Step 3: Choose the "Select Data To Scrape" command.

Before you can choose the data you want to scrape, you have to draw a box around it to tell Outscrape where it is on the page. We call this "Creating a region." Click "Create new region" to create this "region" around the data you want. Move your mouse cursor over some of the data that you want to scrape. You'll notice a rectangular highlight. Left click when you highlight the correct area of the region you want to scrape. A new highlight will appear on all the samples of data that match your selection. 

Your goal here is to move the mouse until you highlight all of an example of the data that you want to scrape. For example, if you want to scrape the URL and description of a Google search result, highlight the first of the results on the page, making sure the rectangle covers the data entirely.

Show walkthrough of Select Data to Scrape (region selection)

TIP: You can do this highlighting more than once, so if you make a mistake, you can simply go to the "Templates" tab and delete the region or data you created by clicking the "X.". ​And, if you cannot highlight all of the data at once, that's fine too. You can create multiple regions and select multiple pieces of data individually with them!

4. Choose the Data:

Step 4:  When you're happy with the region you've created, click the "Next" button.

In this step, you should imagine each piece of data that you want to select as a separate row in a spreadsheet. The results will appear that way. You will want to highlight an example piece of data and click it. 

Then, click another example of this data. The software will then determine all the examples of this data based on your two samples. You can name the rows of your data if you'd like, or they'll just be named the elements' names.

Show walkthrough of Select Data to Scrape (data selection)

TIP: If the data you select includes a link, Outscrape will ask if you want to visit the link in this step and select data on the pages that it finds. This is how you can build multi-level scrapers! Or, you can just click "Nope" and it will save the links (for you to visit later, or just to save). 

5. Enter data into the search field and click the button

Step 5: This step is optional! But, it's good to see how you might enter data into a login or search box.

If you want to enter text into a text field whenever your scraper starts (on the very first page), then choose the Select Text Box command. Click the text box, and enter  the text you'd like to enter into the popup that appears. This will be automatically entered into the text box when your scraper first loads the page.

To go along with this command, you'll often use the Select Button to Click command. Click this, and then choose the button you'd like to click. The scraper will automatically click the button after the first 

Show walkthrough of Select Text Box

Show walkthrough of Select Button to Click

6. Save and Run Your Template:

Step 6: Ready to run your template? Click the "Template" tab. Then click "Save Template" if you'd like to save it, name it something you will remember, and click OK. Then click Start Scraping!

There are several options in the Settings tab that you can change here. They include:

- Name of results file—this defaults to the name of the website and the time if nothing is entered.
- Show pages during scraping—by default, you will scrape pages in the background (headless) without rendering them. If you want to see the scraper in action, you can turn this on.
- Wait between navigation (ms)—this throttles the scraping, by pausing for a specific time period before loading the following pages. 

Show walkthrough of Template tab and Scraping

7. View Your Results:

Step 7: Load your results by visiting the Sessions folder in your Documents.

By default the latest scraped results and log will be in a folder called something like "ebay_com_2017-11-21-21-33-52". The format of this is website_year-month-day-hour-minute-second.

Show results 

Multi-Level Crawling and Scraping

Outscrape makes multi-level scraping easy!

To do multi-level scraping, simply revisit step 4—Choose Your Data—above. 

When Outscrape recognizes that a link was inside of the data you selected, it will ask you if you'd like to visit it. 
Choose "OK", and then build the next part of your template. This will automatically create a multi-level scraping template—a scraper that will scrape links, visit them, and scrape from those pages!

Show multi-level scraping walkthrough

Video Tutorials and Walkthroughs

Have more questions? Watch the 10 minute video below.

Basic Introduction:

Learn what each command does in this basic introduction. 

In-Depth Walkthrough

Demonstration of scraping Craigslist.

Scraping on Infinite Scroll Sites

Grab data from pages that don't have a "Next Page" button, but instead require scrolling down to load new data.

Outscrape can even scrape data 2, 3, or any number of pages deep. For example, you could load a series of product search pages, visit the pages for each resulting product, and from there enter the author's profile and scrape data from their page. Here's an example from Yelp.

Troubleshooting, Tips, and Tricks

While it is easy to use, Outscrape isn't perfect ! It sometimes requires a little extra work on certain pages. 

Outscrape isn't Getting the Data I Expect!

This is usually caused by minor changes in the formatting of the pages, or of how the data is structured from page to page, which isn't obvious to you but makes a big difference to the Outscrape. Sometimes the result is that you see big sections missing in your data, or almost no data at all.

To make sure you get the correct data, follow these steps: 

  • Try selecting different samples of your data. Select different regions, on different pages, as well. A good example of an issue you might experience: Some Google result pages include videos or images at the top, especially on the first page. Try starting on the second page to limit the variations in formatting. 
  • Try multiple templates. If there is a huge change in formatting that means you can't scrape all the data with a single template, just create two separate templates! 
  • If you see big sections of data missing, make sure it's actually on the pages that you are using! Often times, a selection of pages will contain information that you want, but others won't. The result will be a JSON file with gaps in the data, which is to be expected - you can't scrape what isn't there!
  • The scraper loops on the last page! This might be because it doesn't recognize that the "Next Page" button no longer exists. Check the data that it's scraping - is this data repetitive, especially at the very end? If so, you may simply need to manually turn off the scraper when it's finished. Unfortunately there is no way around this.

I Need to Scrape Data That Is Behind A Login

Great! Combine the "Select Button to Click" and "Select Place to Input Text" commands to enter text into the appropriate login fields.

I Need to Scrape Data On An Infinite Scroll Page

Great! Use the "Load More" command.