If you're in Outscrape, you'll see these commands in the first panel on the left. Press "Scrape New Project" to see the next set of commands (below).
Use this to build a new scraping template. Then choose "A series of web pages." Additional scraping features, like scraping images, are on the way.
After using this command, you will need to enter the URL of the site you want to scrape on the next panel and press "Let's get started".
This URL is often the first page of search results, the category of a directory website, a login page, or a page that requires entering a search term and pressing a button to load the next page.
This loads a saved template. Press the "Load Template" button and choose a template (a .json file you've made).
Click here for some pre-made templates that you can run to test the software and see how it works.
Use this to open a support ticket. This works if you have questions about anything—billing, feature requests, bug reports, or just questions on how to use the software.
This command will begin the process of selecting data on a page to scrape.
After pressing this button, you will see a new series of commands.
Do you get to new pages of data by clicking a button?
This command lets you visit those multiple pages. To use, just choose the command and click the "Next Page" link to add it to your template. This link is often called "Next", "Next Page', is an arrow, etc.
When this command is inside your template, the chosen link will be "clicked" after a page has been scraped, moving you to the next page. This will continue until the link is no longer available.
A note: in some rare instances, the site's "Next page" button will still "exist" even though it is no longer clickable. In those instances, the scraper may fail to stop scraping.
This command allows you to scrape "Infinite Scroll" style pages, where data appears by scrolling down, or by clicking a "Load More" button.
To use, choose this command, then press "OK" in the popup that follows if you'd like to select the "Load More" button, or press "Nope" if you'd like to allow the page to scroll down infinitely (or a specified number of times) without pressing a "Load More" button first.
If you've collected a bunch of URLs that you'd like to scrape, which are all formatted the same way, this command is for you.
This can be useful if there is no simple way to move from a series of pages to another, or if you've used Excel or another tool to build a list of URLs like:
Just paste in the URLs or load a file that contains them, and your template will be applied to all of them.
Use this command in conjunction with "Select Place to Input Text" to login to pages, or enter text into a field and press a button like "Search" or "Go."
This command can be used on its own, as well, if you have to press a button to begin scraping, for any reason.
This button will only be pushed *once*.
Use this command in conjunction with "Select Button to Click" to login to websites, or to simple enter text into a field.
You *must* use the "Select Button to Click" command to press any button, however, so unless simply entering text into a field is useful, this command is unlikely to be used on its own.
Here's what MIGHT cause you problems:
Have a question or need support? Visit the support desk.
Outscrape combines an automated web browser with XPath, a kind of syntax that parses XML and HTML. When you highlight parts of a website using the scraper, it uses AI to determine what the XPath is to locate the element(s) that you selected. The automated browser then loads the pages you've entered and the XPath determines the data to scrape.
Of course, you don't need to know any of that - you just need to click a few buttons and it will do the rest. It's that simple!
I've included some sample templates for you to play with.
You can find them here.
This first tutorial will go through scraping results from Craigslist. You can also think of this as a flat, or horizontal, scraping template. Similar examples of this would be scraping data from the search result pages on Google, without loading the results themselves, or in this case, scraping data from Craigslist search results without visiting the individual results pages themselves.
Step 1: Click the "Scrape a New Project" button.
Then click "a series of web pages." In the window that appears, enter a URL that contains data you want to scrape. In this example you can see a craigslist search page.
TIP: This tutorial will also show how to use the buttons to enter text into a search field !
Tip: Click the images to enlarge!
Step 2: Choose the "Select Next Page Button" .
I like to do this first so I don't forget it, but the order doesn't matter. Locate the button, text, or element that you click to get to the next page. For example, on Google this is either the word "Next" or the ">" arrow.
Click this. It should highlight, to show that it has been selected.
TIP: If there is no next button, watch the tutorial on scraping from lists of URLS, or the tutorial on scraping infinite scroll websites!
Your scraper will run until it can no longer find a "Next" button. Sometimes, the HTML code that the scraper uses to recognize the next button continues to exist after it is no longer clickable. In those instances, the scraper will need manually shut down. (You will notice it repeating the final page.)
Step 3: Choose the "Select Data To Scrape" command.
Before you can choose the data you want to scrape, you have to draw a box around it to tell Outscrape where it is on the page. We call this "Creating a region." Click "Create new region" to create this "region" around the data you want. Move your mouse cursor over some of the data that you want to scrape. You'll notice a rectangular highlight. Left click when you highlight the correct area of the region you want to scrape. A new highlight will appear on all the samples of data that match your selection.
Your goal here is to move the mouse until you highlight all of an example of the data that you want to scrape. For example, if you want to scrape the URL and description of a Google search result, highlight the first of the results on the page, making sure the rectangle covers the data entirely.
TIP: You can do this highlighting more than once, so if you make a mistake, you can simply go to the "Templates" tab and delete the region or data you created by clicking the "X.". And, if you cannot highlight all of the data at once, that's fine too. You can create multiple regions and select multiple pieces of data individually with them!
Step 4: When you're happy with the region you've created, click the "Next" button.
In this step, you should imagine each piece of data that you want to select as a separate row in a spreadsheet. The results will appear that way. You will want to highlight an example piece of data and click it.
Then, click another example of this data. The software will then determine all the examples of this data based on your two samples. You can name the rows of your data if you'd like, or they'll just be named the elements' names.
TIP: If the data you select includes a link, Outscrape will ask if you want to visit the link in this step and select data on the pages that it finds. This is how you can build multi-level scrapers! Or, you can just click "Nope" and it will save the links (for you to visit later, or just to save).
Step 5: This step is optional! But, it's good to see how you might enter data into a login or search box.
If you want to enter text into a text field whenever your scraper starts (on the very first page), then choose the Select Text Box command. Click the text box, and enter the text you'd like to enter into the popup that appears. This will be automatically entered into the text box when your scraper first loads the page.
To go along with this command, you'll often use the Select Button to Click command. Click this, and then choose the button you'd like to click. The scraper will automatically click the button after the first
Step 6: Ready to run your template? Click the "Template" tab. Then click "Save Template" if you'd like to save it, name it something you will remember, and click OK. Then click Start Scraping!
There are several options in the Settings tab that you can change here. They include:
- Name of results file—this defaults to the name of the website and the time if nothing is entered.
- Show pages during scraping—by default, you will scrape pages in the background (headless) without rendering them. If you want to see the scraper in action, you can turn this on.
- Wait between navigation (ms)—this throttles the scraping, by pausing for a specific time period before loading the following pages.
Step 7: Load your results by visiting the Sessions folder in your Documents.
By default the latest scraped results and log will be in a folder called something like "ebay_com_2017-11-21-21-33-52". The format of this is website_year-month-day-hour-minute-second.
To do multi-level scraping, simply revisit step 4—Choose Your Data—above.
When Outscrape recognizes that a link was inside of the data you selected, it will ask you if you'd like to visit it.
Choose "OK", and then build the next part of your template. This will automatically create a multi-level scraping template—a scraper that will scrape links, visit them, and scrape from those pages!
Have more questions? Watch the 10 minute video below.
Learn what each command does in this basic introduction.
Demonstration of scraping Craigslist.
Grab data from pages that don't have a "Next Page" button, but instead require scrolling down to load new data.
Outscrape can even scrape data 2, 3, or any number of pages deep. For example, you could load a series of product search pages, visit the pages for each resulting product, and from there enter the author's profile and scrape data from their page. Here's an example from Yelp.
While it is easy to use, Outscrape isn't perfect ! It sometimes requires a little extra work on certain pages.
This is usually caused by minor changes in the formatting of the pages, or of how the data is structured from page to page, which isn't obvious to you but makes a big difference to the Outscrape. Sometimes the result is that you see big sections missing in your data, or almost no data at all.
To make sure you get the correct data, follow these steps:
Great! Combine the "Select Button to Click" and "Select Place to Input Text" commands to enter text into the appropriate login fields.
Great! Use the "Load More" command.