Entries Tagged as 'Books'

Doin’ the Things a Spider Can!

Amazing Fantasy #16 (Dec. 1995). Painted cover by Paul Lee.Image via Wikipedia

This is a quick post just to say that I’m alive. While I was browsing the Library of Congress’ blog, I came across this awesome post: The Library of Congress acquired 24 pages of original 1962 drawings from “Amazing Fantasy #15.”

Amazing Fantasy #15 was the first appearance of Spider Man, and spawned one of the most prolific comic characters of all time. Although I’ve never really been a big fan of Spider Man, the thought of seeing these pages in person makes my fatty little fanboy heart dance with glee.

If the Library of Congress deemed these pages worthy of inclusion, perhaps that means comic books have finally become a ‘legitimate’ art form.

[Read more →]

Website Scraping for Dummies

For the last week, my interest has been aimed at website scraping. Wikipedia defines website scraping as:

“a technique in which a computer program extracts data from the display output of another program. The program doing the scraping is called a screen scraper. The key element that distinguishes screen scraping from regular parsing is that the output being scraped was intended for final display to a human user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing.”

Website scraping has traditionally been the domain of the Black Hat internet marketer, although there are plenty of White Hat applications for website scraping as well. I’m interested in it to build a snail mail list for my wife to use in her new business.

I’ve found a lot of resources, but sadly, they seem to be geared towards programmers. I can edit PHP, and sometimes copy/paste cobble things together, but outside of that, I’ve never had much luck learning how to program, mainly due to time constraints and an inability to dedicate myself to one language.

Although I’ve found a ton of information on website scraping, I’m going to limit myself to a shortish list.

Website Scraping Platforms

Web Harvest is an Open Source, Java based platform geared towards website data extraction. As they put it, Web Harvest “offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. ”

Web Harvest looks to be extremely powerful and flexible, and it’s free, which is always nice. If you’re able to write code in Java, you may want to look at it pretty closely.

The Twit88 blog has two excellent tutorials on using Java/Web Harvest to extract data from websites. Web Scraping using Web Harvest, and Java - Writing a Web Page Scraper or Web Data Extraction Tool.

Thanks to MIT’s SIMILIE Project, you can use two of their programs - Piggy Bank, and Solvent - to turn your copy of Mozilla FireFox into a data scraping platform. Both plugins are free under the BSD License, and come with sample scrapers to help you get started.

 

Data Scraping With PHP

Sunil Bhatia has an article on writing website scrapers in php. His tutorial goes through the basics, and is written with newbies in mind. An excellent stepping stone for aspiring programmers such as myself.

Yahoo! Pipes prove their power and flexibility once again as Day explains how to use the Fetch Page module to make a web scraper. This may be just the trick to make feeds off of Yahoo! Buzz or eBay Pulse.

Finally, I found a bunch of specialized website scrapers and programming libraries at Schrenk.com. The scripts are meant to be used in conjunction with the book “Webbots, Spiders, and Screen Scrapers” by Michael Schrenk, but I think they’d also be a good starting point for anyone with a little programming knowledge.

Geeking Out

This post is going to deal entirely with comic books and the movies that are based on them. If you’re not interested in either, then stay tuned for more money making posts.

According to Empire Online Ryan Reynolds (Waiting, Blade 3) has joined the cast of the new Wolverine movie. Reynolds will be playing Deadpool, and fighting alongside Wolverine (Hugh Jackman). I’m actually pretty disappointed by the news. As much as I like Ryan Reynolds, I was hoping that the Wolverine movie was going to follow the story of Weapon X, which is an amazing story.

Apparently Warner Brothers and Leonardo DiCaprio are planning on filming a live action version of Katsuhiro Otomo’s groundbreaking cyberpunk tale Akira. I was taken aback at first, however short of Orlando Bloom signing up, Leo is one of the most effeminate/androgynous American movie stars I can think of - which makes him perfect for a live action remake of an anime movie.

This news comes on the heels of my finishing the sixth Akira graphic novel. 2500+ pages in 7 days. It’s an amazing series, and this is the second time I’ve read the entire series. While the movie will always have a special place in my heart, after reading the manga in their War and Peace-esque length, I just can’t bring myself to watch it again. I am impressed that they are planning on doing two movies, but there’s no way that live action special effects can compare to the animation, especially when Tetsuo loses control of his power and becomes Godzilla blob. I cannot envision that scene looking anything less than crappy rendered in CGI.

I got the tip off this morning that Columbia Pictures has optioned “The Boys” for a big screen adaptation. I have been a huge fan of Garth Ennis’ work on the “Preacher” series of graphic novels for some time now, and was recently introduced to “The Boys.”

The basic premise is that there’s a secret outfit within the CIA that keeps track of superheroes, and if necessary reigns them in. Blackmail, extortion, murder, nothing is too dirty or underhanded for The Boys to stoop to in fufilling their mission. The stories are well crafted, and chock full of Ennis’ trademark black humor and sexual perversion.

I picked it up to read the foreward, and ended up reading the whole thing in one sitting. My sides splitting with laughter, I literally couldn’t turn the pages fast enough to take in the story. The second graphic novel is due to be released in the next few months, and I’m looking forward to picking it up. I’ve been told on good authority that the Tech Knight, Garth Ennis’ parody of Batman, is a real gutbuster.

My great fear with this adaptation is that they will mangle the casting. I don’t have any strong feelings about any of the major characters except for Hugh “Wee Hughie” Campbell. Wee Hughie was based, visually at least, on actor Simon Pegg (Shaun of the Dead, Hot Fuzz), and I’m afraid that he may get passed over in the casting for a more familiar nebbish actor, such as Steve Buscemi.

I was excited to see that HBO is working on a Preacher series. Then I saw the details. The plot synopsis states: An ex-priest turned professional gambler exposes murder and corruption in a small New Mexico town. Um, WTF? Did anyone involved with the series even read the source material?

Powers Boothe has been cast as Jesse Custer (The Preacher) Powers Boothe is a great actor, but the Custer character is a twenty-something shepard to a redneck town in deepest Texas. Unless the series starts at the end of the graphic novels (which I haven’t made it to) and tells the story in flashbacks. However, the lack of Cassidy, who is integral to the series, makes me believe it’s not to be. They may have renamed the character, but why?

This makes me want to cry. I pray that HBO’s adaptation of George R.R. Martin’s “A Song of Fire and Ice” remains closer to the original material.

Finally, it appears that the “Priest” movie is still on, but details are scarce. I’m dreading this one. The manga tells the story of Ivan Isaacs, a priest who sold his soul to a demon to gain vengance against one of the arch-dukes of hell. It’s a gritty, intense, rollercoaster that grabs you by the front of your shirt and holds you tight throughout each 200 page installment.

The series originally ran for 26 issues in South Korea, but seems to have stalled out. I got issue 14 two years ago, and I’m still waiting on issue 15. I’ve actually given up hope on ever finishing this series. It’s a shame, since I loved the merging of the horror and western genres. It was a bloody, brutal, and most importantly - interesting series, that ended too soon on this side of the ocean.