Entries Tagged as 'google'

Website Scraping for Dummies

For the last week, my interest has been aimed at website scraping. Wikipedia defines website scraping as:

“a technique in which a computer program extracts data from the display output of another program. The program doing the scraping is called a screen scraper. The key element that distinguishes screen scraping from regular parsing is that the output being scraped was intended for final display to a human user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing.”

Website scraping has traditionally been the domain of the Black Hat internet marketer, although there are plenty of White Hat applications for website scraping as well. I’m interested in it to build a snail mail list for my wife to use in her new business.

I’ve found a lot of resources, but sadly, they seem to be geared towards programmers. I can edit PHP, and sometimes copy/paste cobble things together, but outside of that, I’ve never had much luck learning how to program, mainly due to time constraints and an inability to dedicate myself to one language.

Although I’ve found a ton of information on website scraping, I’m going to limit myself to a shortish list.

Website Scraping Platforms

Web Harvest is an Open Source, Java based platform geared towards website data extraction. As they put it, Web Harvest “offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. ”

Web Harvest looks to be extremely powerful and flexible, and it’s free, which is always nice. If you’re able to write code in Java, you may want to look at it pretty closely.

The Twit88 blog has two excellent tutorials on using Java/Web Harvest to extract data from websites. Web Scraping using Web Harvest, and Java - Writing a Web Page Scraper or Web Data Extraction Tool.

Thanks to MIT’s SIMILIE Project, you can use two of their programs - Piggy Bank, and Solvent - to turn your copy of Mozilla FireFox into a data scraping platform. Both plugins are free under the BSD License, and come with sample scrapers to help you get started.

 

Data Scraping With PHP

Sunil Bhatia has an article on writing website scrapers in php. His tutorial goes through the basics, and is written with newbies in mind. An excellent stepping stone for aspiring programmers such as myself.

Yahoo! Pipes prove their power and flexibility once again as Day explains how to use the Fetch Page module to make a web scraper. This may be just the trick to make feeds off of Yahoo! Buzz or eBay Pulse.

Finally, I found a bunch of specialized website scrapers and programming libraries at Schrenk.com. The scripts are meant to be used in conjunction with the book “Webbots, Spiders, and Screen Scrapers” by Michael Schrenk, but I think they’d also be a good starting point for anyone with a little programming knowledge.

Set AdWords Times to Save Some Dimes

Affiliate Marketing

Looking to save to some money on your Google AdWords? According to Online Money Dot.com your bids may actually be cheaper at night.

Jay at Online Opportunity has a pretty good tutorial on how to split test ads.

Y! Store Tutorials has a “dummies” guide to building landing pages. If you can’t build a landing page after reading this, then abandon all hope.

Blogging / Writing

If you’ve been writing an ebook, or just thinking about it, Hendry Chang has a laundry list of 18 reasons to give it away. If you’ve been waffling about the fate of your ebook, then this might just push you over the edge on pricing.

Traffic / Search Engine Optimization

The eBusiness Banter Blog has posted a three part tutorial on building traffic to your blog.
Part 1 | Part 2 | Part 3

Software Marketing Secrets has a good article explaining why you should be building brands for your products.

Affiliate Marketing, Link Building, Search Engine Optimization: Sunday Morning Free For All

Tools

Icon Interactive has Four Free Tools for your use: Link Popularity, Search Engine Submitter, Keyword Suggestion, and Word Cloud. Of them, I found the Link Popularity to be the most informative, whereas the Search Engine Submitter and Keyword Suggestion tools kept breaking on me. I guess you get what you pay for.

Wordpress

Via Weblog Tools Collection: Blueprint Design Studio graces us with a list of their Top 10 Essential Plugins for Wordpress. I’ve used a lot of the plugins on this list, but TinyMCE always was a pain in the arse to get working properly. Subscribe2 is what all of the big boys use on their sites, but for some reason PlugInstaller breaks it, so I’m out of the loop on that one too. Finally, CFormsII seems to be powerful in the right hands, which are obviously not mine - I’ve never had any joy in getting it to work.

Performancing has a great article on using WordPress to build a web directory which is centered around two plugins; Alex Tang’s Link Directory plugin, and Links Back’s plugin WP Directory. I’m interested in trying this out, but according to Mr. Dash they’re both broken in WordPress 2.3.2, which means they’re probably just going to frustrate me. Still, something I’m going to keep on the back burner.

Search Engine Optimization / Link Building

Blogging Mix has a two part tutorial on how to get Google to crawl your website. They’re full of great ideas, and you’re probably already doing them, but a refresher course never hurts. The one tip that I always do that isn’t on this list, is adding your website feed to iGoogle. I remember reading somewhere that if you add your RSS to iGoogle it helps bump up your place in the queue. If your feed is provided by FeedBurner (a division of Google) then you should be doubly covered, right? Part 1 | Part 2

Build a Blog asks the question “To promote using blog directories, or not?”

Thanks to the post above, I was introduced to Skelliwag’s tutorial on Hansel and Gretel Link Building which is a straight forward guide to getting quality incoming sites.

Do Follow Directory is a directory of sites that have “do follow” enabled.

Info Doorway has a large list of “do follow” sites and forums arranged by Page Rank.

Eric Mitz tells us how he uses forums for backlinks.

Courtney Tuttle has a list of 102 ways to make your site a backlink superstar.

Micro Persuasion opines that we’re like a million monkeys on treadmills. Odd title aside, it’s a thought provoking discussion on channels and internet trends of the past few years.

Affiliate Marketing

Squidoo Lens on using Squidoo for Affiliate Marketing

My Web 2.0 has 5 tips for creating powerful text ads.

AffiliateSeeking.com is a directory of the various programs by which you can become an affiliate marketer.

Paul updated his Affiliate Marketing Guide. Awesome advice from somebody who is making 6 figures a month.

This post has been a week in the making, so I hope it’s not a total deluge. I plan on adding a lot of these links to the main site as time permits this week.

Tuesday I go see a specialist for my hernia and see when they want to perform surgery. I’m praying that it won’t be until after my wife returns to work. I’ve been dealing with this for several months now, another one shouldn’t hurt as long as I take it easy. A little Alieve generally keeps me on my feet, and that’s all I need. If I do have to go in fo surgery, expect posting to pick up dramatically.

I’m currently hatching several mini e-books, and when I finally hatch them, you’ll be the first to know.

Bikinis and SEO

What Sells Online: SEO With Long Tail Keywords.

Download “The Long Tail” ebook for free courtesy of Change This.

LongTail.com is the homepage of Chris Anderson, author of “The Long Tail” and is filled to the gills with lots of good ideas and observations on ‘free.’

Lawrence Lessig’s novel The Future of Ideas has been released under a Creative Commons license.

SeoQuake is a Firefox extention that will show you Google PageRank, the amount of pages indexed by Google/Yahoo/MSN, the age of the page, and many, many more amazing features. It’s an amazing free plugin that will help you spy on the competition.

Media Viper’s list of negative ppc keywords.

WHDB.com has a very thorough list of free alternatives to commercial software.

If you’re doing any kind of video marketing, DeskPing has a list of 5 royalty free music sites.

If you’re looking to finance your next business venture, Mind of a Hustler suggest you try the stone soup method.

Dustin Brewer shares his thoughts about creating interesting content for social networks.

Saturday Morning Sweeties - NSFW!

This last bit is just for the guys. Egotastic was kind enough to bless us with Olivia Munn’s Complex Magazine photo shoot. If you have basic cable, and have ever stumbled across G4 (formerly TechTV), you’ll recognize her as the hostess of Attack of the Show. Honestly, she’s the primary reason I tune into the show, although Layla Kayleigh and Kristen Holt don’t hurt the eyes either.

Anyway, enjoy the eye candy, and I have a couple of big announcements coming down the pike this weekend. Stay Tuned!

More Entrecard Stuff, and Some Linkdumping

I noticed that Nikolai has added a Entrecard Blog Browser. That man, he’s addicted to Entrecarding, I swear ;)

In the comments to Nikolai’s post about his first set of Entrecard tools I found a link to the Entrecard Page Ranker at John is Fit. It takes the rss feed of the last people to card you, and then compares them via Google Pagerank. That way you know who you really should be reciprocating with.

Now for the Linkdump. No rhyme, but I have my reasons. Sorry if any of these are rehashes, I came to the game kinda late.

101 Ways to Make Money With DigitalPoint Forums: http://money.earnersclub.net/2007/09/02/101-ways-to-make-money-online-with-digitalpoint-forums/
Simple Blogging SEO Techniques: http://mixedmarketarts.com/2008/02/16/simple-blogging-seo-tweaks/
The Link Building Cookbook: http://mixedmarketarts.com/2007/11/13/the-link-building-cookbook/
Flypaper Resource Pages: http://www.jtpratt.com/2008/02/05/flypaper-resource-pages-how-to-get-100-times-more-traffic/
How to Turn Spam and Splog Into Backlinks and Gold: http://www.jtpratt.com/2007/11/01/how-to-turn-spam-and-splog-into-backlinks-and-gold/
The Super Affiliate’s Guide to PPC Marketing: http://zacjohnson.com/the-super-affiliates-guide-to-ppc-marketing/

A lot of these links have been posted to the main BookMark Money website, and some haven’t. Either way, it was time to clear out the open tabs in Firefox.