Choose style:

Author Topic: Scraping  (Read 1751 times)

0 Members and 1 Guest are viewing this topic.

Offline robertsala

  • Jr. Member
  • **
  • Posts: 85
  • Karma: 8
  • New Forum User
    • View Profile
Scraping
« on: December 30, 2014, 11:05:14 pm »
Hi guys!

I'm creating a website in order to sell some products from an authorized wholesaler. I'm using opencart as an eCommerce platform and now the biggest challenge is scraping off all the data; products, prices, categories, description, etc. Basically what is called Dropship.

Things I need to consider:
Import all products from all available data feed
import all categories
Import both short and long product descriptions
Download product images

The wholesaler has an API but I don't see ANYWHERE in opencart's admin panel where to add it. Anyways, I bumped into a site called http://scrapy.org/
It says its a framework to extract the data I need from any website. Have you guys ever heard of this program and do you know of any that you can recommend to me. Thanks in advance!

Website under development is
www.goprotech.mobi

Offline 10i

  • Trusted User
  • Member
  • *****
  • Posts: 467
  • Karma: 126
  • Peppermint Enthusiast
    • View Profile
    • My Peppermint Blog
  • Peppermint version(s): Peppermint 8 - 64 bit
Re: Scraping
« Reply #1 on: December 31, 2014, 01:48:08 pm »
Hi, sorry I don't have any experience with this, but I do wish you the best of luck.
Running Peppermint 8- 64 bit on my Lenovo i3 laptop.

View my Linux blog:  http://myiceadventure.blogspot.com

Online VinDSL

  • Global Moderator
  • Hero
  • *****
  • Posts: 5062
  • Karma: 909
  • Peppermint Mod
    • View Profile
  • Peppermint version(s): Developmental Builds
Re: Scraping
« Reply #2 on: January 03, 2015, 11:48:55 am »
Interesting website!

It looks like they came up with some sort of python framework for scraper sites.  Python would be great for that -- you won't have to write your own libs.

Depending on what you're trying to accomplish, you might not need an industrial strength framework like that.  Sure, if you're sucking data from 100s of websites for news aggregation, or whatevs, then something like that would be indispensable, but...

Personally, for the limited amount amount of scrapping that I do on my websites, I just hard-code my scripts using cURL.  cURL is your friend.  I'm surprised more ppl don't use it.

As an aside, be very careful when you suck data from other websites!  I got a cease and desist order from a corporate legal department, some years ago, along with a demand for $30,000 for scraping a few stock exchange prices off their website, and re-posting it on mine.  Just saying...

Anyway, I'll check into the website you linked more thoroughly later.  Thx  ;)
« Last Edit: January 03, 2015, 12:07:50 pm by VinDSL, Reason: Linkage »