企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
## 如何把采集结果存入mysql <http://www.jishubu.net/yunwei/python/424.html> pyspider是个非常强大简单易用的爬虫框架,但是默认软件会把采集的所有字段打包保存到默认的数据库中,和其它软件没法整合。现在需求是需要把采集的字段做为单独的字段保存到自定义的mysql数据库中,本人技术能力有限,个人感觉实现方法不是最优的,大家有能力的请自行改进,没能力的凑合着用吧。或是直接下载py脚本:把 pyspider的结果存入自定义的mysql数据库中[mysqldb.zip](http://www.jishubu.net/wp-content/plugins/wp-ueditor/ueditor/php/upload/8521423797887.zip) ~~~ pyspider结果保存到数据库简单样例。 使用方法:     1,把本文件放到pyspider/pyspider/database/mysql/目录下命名为mysqldb.py。     2,修改本文件的数据库配置参数及建立相应的表和库。     3,在脚本文件里使用from pyspider.database.mysql.mysqldb import SQL引用本代码.     4,重写on_result方法,实例化sql并调用replace(replace方法参数第一个是表名,第二个是结果。)。简单例子如下: #!/usr/bin/env python # -*- encoding: utf-8 -*- # Created on 2015-01-26 13:12:04 # Project: jishubu.net      from pyspider.libs.base_handler import * from pyspider.database.mysql.mysqldb import SQL           class Handler(BaseHandler):     crawl_config = {     }          @every(minutes=24 * 60)     def on_start(self):         self.crawl('http://www.jishubu.net/', callback=self.index_page)          @config(age=10 * 24 * 60 * 60)     def index_page(self, response):         for each in response.doc('p.pic a[href^="http"]').items():             print each.attr.href                     @config(priority=2)     def detail_page(self, response):         return {             "url": response.url,             "title": response.doc('HTML>BODY#presc>DIV.main>DIV.prices_box.wid980.clearfix>DIV.detail_box>DL.assort.tongyong>DD>A').text(),         }     def on_result(self, result):         #print result         if not result or not result['title']:             return         sql = SQL()         sql.replace('info',**result) ''' from six import itervalues import mysql.connector from datetime import date, datetime, timedelta      class SQL:              username = 'pyspider'   #数据库用户名         password = 'pyspider'   #数据库密码         database = 'result'     #数据库         host = 'localhost'      #数据库主机地址         connection = ''         connect = True     placeholder = '%s'              def __init__(self):                 if self.connect:                         SQL.connect(self)     def escape(self,string):         return '`%s`' % string         def connect(self):             config = {                 'user':SQL.username,                 'password':SQL.password,                 'host':SQL.host             }             if SQL.database != None:                 config['database'] = SQL.database                  try:                 cnx = mysql.connector.connect(**config)                 SQL.connection = cnx                 return True             except mysql.connector.Error as err:                  if (err.errno == errorcode.ER_ACCESS_DENIED_ERROR):                 print "The credentials you provided are not correct."             elif (err.errno == errorcode.ER_BAD_DB_ERROR):                 print "The database you provided does not exist."             else:                 print "Something went wrong: " , err             return False               def replace(self,tablename=None,**values):         if SQL.connection == '':                     print "Please connect first"                     return False                      tablename = self.escape(tablename )                 if values:                         _keys = ", ".join(self.escape(k) for k in values)                         _values = ", ".join([self.placeholder, ] * len(values))                         sql_query = "REPLACE INTO %s (%s) VALUES (%s)" % (tablename, _keys, _values)                 else:                         sql_query = "REPLACE INTO %s DEFAULT VALUES" % tablename                                   cur = SQL.connection.cursor()                 try:                     if values:                             cur.execute(sql_query, list(itervalues(values)))                     else:                             cur.execute(sql_query)                     SQL.connection.commit()                     return True                 except mysql.connector.Error as err:                     print ("An error occured: {}".format(err))                     return False ~~~ ## module :No module named mysqldb `http://ftp.ntu.edu.tw/MySQL/Downloads/Connector-Python/`