La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

serpscraper-tutorial-feb2011

De
20 pages
Serpscraper Tutorial Effective use of Serpscraper for building and cleaning url lists Updated Tuesday, February 01, 2011 http://www.serpscraper.com Page 1 – Table of contents Page 2 – Introduction Page 3 – What is a Target? Page 4 – What is a Footprint? Page 5 – Getting Started Automating it. Page 6 – Creating Footprint Files – Part 1 Page 7 – Creating Footprint Files – Part 2 Page 8 – Wordlists Page 9 – Choosing Spiders Page 10 – Shuffle User Agents Page 11 – Using Proxies Page 12 – Custom Referrer Page 13 – Getting Serpscraper Running Page 14 – Checking if SerpScraper is working – Part 1 Page 15 – Checking if SerpScraper is working – Part 2 Page 16 – Checking if SerpScaper is Working – Part 3 Page 17 – I have a raw list. Now What? Page 18 – Verifying Footprints to clean you lists Page 19 – Other Tabs Page 20 – Intranet with dedicated one to one help. Page 1 Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial Effective use of Serpscraper for building and cleaning url lists Updated Tuesday, February 01, 2011 http://www.serpscraper.com Think of SerpScraper like a Google search on ...
Voir plus Voir moins

Vous aimerez aussi

Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

Page 1 – Table of contents
Page 2 – Introduction
Page 3 – What is a Target?
Page 4 – What is a Footprint?
Page 5 – Getting Started Automating it.
Page 6 – Creating Footprint Files – Part 1
Page 7 – Creating Footprint Files – Part 2
Page 8 – Wordlists
Page 9 – Choosing Spiders
Page 10 – Shuffle User Agents
Page 11 – Using Proxies
Page 12 – Custom Referrer
Page 13 – Getting Serpscraper Running
Page 14 – Checking if SerpScraper is working – Part 1
Page 15 – Checking if SerpScraper is working – Part 2
Page 16 – Checking if SerpScaper is Working – Part 3
Page 17 – I have a raw list. Now What?
Page 18 – Verifying Footprints to clean you lists
Page 19 – Other Tabs
Page 20 – Intranet with dedicated one to one help.

Page 1
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

Think of SerpScraper like a Google search on steroids.
When you search Google for a keyword string like “Buy Viagra” you are presented
with a page of 10 organic results like this.


In that search result called Organic Listings there is a title, snippet and a url.
If you are using SerpScraper for an automated linking tool you only want the url part.
To strip it out the url from millions of search engine results manually would take too long so
you need to use a tool like SerpScraper to extract exactly the part you need.

Page 2

Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

What is a target?
Let’s take an example if you have bought a tool like Autopligg and you want to feed it fresh
targets.

You would be very wise to keep feeding tool fresh targets because surprising enough a lot of
people don’t do this and they want to just hit default lists or ones they have downloaded.

Just to explain about targets and urls to submit to.
A target is a website with a content management system like Pligg.com on it.
Pligg owners want people to submit stories and links to their web site to build their content for
them.
A typical pligg site owner will want to be the next .com millionaire and think they can build a
site like digg.com
They never think that there are unscrupulous people like me and you that will submit links to
their site just for our own personal gain and in such a huge volume.
So a target is their site that allows you to submit a story or just a post with a link back to your
webpage.
To retrieve the urls to feed your tools with targets to post your links to you need to find a
`footprint`.





Page 3
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

What is a Footprint?
Almost every software published on the internet has a unique piece of text or a url format that
will identify all the sites from one or several Google searches.
For example `powered by pligg` is a footprint that is used on all of the older pligg bookmarking
platforms.
To be very specific on your footprint hunting “powered by pligg” inurl:register.php
This is a footprint that will identify the sites you need and find you a nice clean list.




Page 4
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

Getting Started Automating it
To automate exactly what I have shown you above we will use the automated tool for Google
and the other search engines called Serpscraper.
I have to assume you have a basic level of competency and can download and execute a tool so
this tutorial explains how to use.
Open SerpScraper by clicking on the serpscraper2.exe file.
You should see this.

This is the standard Serpscraper Interface. Ignore the Datamining Features in the top right hand
corner as they are just experimental at the moment.
Page 5
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

Creating footprint files

Top Left under Search Scraping is your Footprint list.


To access this click the button at the bottom.

Page 6
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

Creating footprint files part 2
Clicking Show Footprint Directory will open the folder containing all your footprints,.
In the folder it should have some sample text files.
To add a new footprint just create a text file in notepad called myfootprintame.txt

Put in your footprint you want to search for “powered by pligg” or whatever target you are
looking for.
Save it.
Close the folder
CLOSE SERPSCRAPER to activate it
And then re-open Serpscraper.
Your target should now appear in the dropdown list.
Note: When you are doing your footprints you can use things like inurl:register.php ONLY in Google.
For other engines like Bing thay don’t support that command so you will have to use a different
footprint or search term to get the results.
This will all come with experience.
Seems hard now but as you become a scraping Ninja you will take it all in your stride.




Page 7

Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

Wordlists
You need to have big ass wordlists to find the most urls
If you just scrape with an empty wordlist you can only do it once.
It will bring back very few results because of the way the search engines work.
To get a really diverse set of results for the same footprint you need to have a really diverse
wordlist.
You can get various wordlists and put them in a .txt file.
Same as before, Close SerpScraper and re open after to load and show the list.
There is some default lists in Serpscraper but you can download a list with about 1 million
common words here. (big file)
So just to be clear.
Download that. Unzip it. Drop it into your wordlist directory. Close SerpScraper to activate and
reopen.
You should now be able to choose that wordlist list.
You can use numbers or specific keyword lists.
Play, Experiment, Learn and have fun.






Page 8
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com

Choosing Spiders
There are currently around 40 search engine spiders in SerpScraper (at the time of doing this
doc)
I will keep emailing you with them as I build more or you can check here
https://support.syndk8.com and look for the Spiders.zip download link
Just choose the spider you want to use.
Remember if you are using inurl: as a search, only Google support it.
As a quick way of getting a handle on how the local spiders work.
Take for example GoogleDOTcoDOTuk

The previous Spider did google.com
Compare this result
.com which is historically USA
http://www.google.com/#sclient=psy&hl=en&q=Black+Hats+are+Sexy to
The UK version which is .co.uk
http://www.google.co.uk/#sclient=psy&hl=en&site=&source=hp&q=Black+Hats+are+Sexy
And then there is .co.uk which is UK only
http://www.google.co.uk/#q=Black+Hats+are+Sexy&hl=en&prmd=ivns&source=lnt&tbs=ctr:co
untryUK|countryGB&cr=countryUK|countryGB
You will get 3 times as many results because you are targeting different datacenters.
As you see there are different sites listed in each most of the time.
With the 40 different spiders you should theoretically get 40x more results.
Some will overlap because the search engines are flawed in a big way on their Geo targeting.

Page 9
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear. Serpscraper Tutorial
Effective use of Serpscraper for building and cleaning url lists
Updated Tuesday, February 01, 2011
http://www.serpscraper.com


Shuffle User Agents
The search engines don’t like people automatically hitting them to extract urls because you are
draining their resources by hitting their search engines millions of times a day.

They are quite tolerant but at some point will get quite annoyed with you.
When the search engines get annoyed they will show you a capture and temporarily block your
ip so you need to try everything you can to avoid detection but often this can cause you more
problems that just risking a temporary block.
A user agent is a signal that you send from your computer to a website to let them know
what technology you are using.

For example, this is my user agent in my Firefox browser;
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203
AlexaToolbar/alxf-2.01 Firefox/3.6.13 GTB7.1
My Internet Explorer:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

We can send different user agents in SerpScraper by just pasting a list into the box.
You don’t need to go crazy with this and just a few will do.
Download a list of user agents and just paste a few into the User Agents field and tick shuffle.
No need to go mad and have millions unless you are playing and experimenting.
List of Agents here.



Page 10
Serpscraper tutorial. You are free to pass this tutorial to anyone but always make sure they know where
to download a fresh copy https://support.syndk8.com/manuals/serpscraper-tutorial.pdf

Anything in this document not clear and easy to understand? Email earl@syndk8.com and I will expand
on the section you are confused with or create a supplemental document to make everything clear.

Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin