PHP SDK

SDK下载

环境准备

  • PHP 5.5+,可通过php -v命令查看当前的PHP版本。
  • cURL 扩展,可通过php -m命令查看curl扩展是否已经安装好。

安装

有两种方式安装SDK:

  • composer方式
  • 源码方式

composer方式

  1. 您可以通过composer安装您的项目依赖,需要您在项目的根目录运行:

    composer require shenjian/shenjian-sdk-php
  2. 或者在您的 composer.json 中声明对Shenjian SDK for PHP的依赖:

    {
    "require": {
    "shenjian/shenjian-sdk-php": "~1.0"
    }
    }
  3. 通过composer install安装依赖,安装完成后引入依赖:

    require_once __DIR__ . '/vendor/autoload.php';

注意:

  • 如果您的项目中已经引用过 autoload.php ,则加入了SDK的依赖之后,不需要再引入 autoload.php 了。
  • 如果使用composer出现网络错误,可以使用composer中国区的镜像源,方法是在命令行执行:
    composer config -g repositories.packagist composer http://packagist.phpcomposer.com

或参照镜像用法

源码方式

  1. 使用SDK源码时,您需要在发布页面中选择相应版本并下载打包好的zip文件。
  2. 解压后的根目录中包含一个 autoload.php 文件,您需要在代码中引入这个文件:

    require_once '/path/to/oss-sdk/autoload.php';

基本使用

安装好 SDK 后,接下来介绍如何使用 SDK。在使用 SDK 之前,需要有一个网多云账号,并登录用户中心获取一对有效的UserKey和UserSecret

获取应用列表

下面的代码可以获取应用列表:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$app_list = $shenjian_client->getAppList();
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
foreach ($app_list as $app){
print("AppId: " . $app->getAppId() . "\n");
print("Info: " . $app->getInfo() . "\n");
print("Name: " . $app->getName() . "\n");
print("Type: " . $app->getType() . "\n");
print("Status: " . $app->getStatus() . "\n");
print("TimeCreate: " . $app->getTimeCreate() . "\n");
echo("\n");
}

账户示例

获取账户余额

下面的代码可以获取账户余额,余额是float类型并保留两位小数:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$balance = $shenjian_client->getMoney();
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Balance: " . $balance);

获取node信息

下面的代码可以获取用户节点的总数和正在运行的节点数:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$node = $shenjian_client->getNode();
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Node All: " . $node->getNodeAll() . "\n");
print("Node Running: " . $node->getNodeRunning() . "\n");
print("Node Gpu All: " . $node->getNodeGpuAll() . "\n");
print("Node Gpu Running: " . $node->getNodeGpuRunning());

应用示例

获取应用列表

下面的代码可以获取账号下的所有类型应用,参数 page 默认值是 1 ,page_size 默认值是 50 ,$params 可不传:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$params['page'] = 1;
$params['page_size'] = 10;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$app_list = $shenjian_client->getAppList($params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
foreach ($app_list as $app){
print("AppId: " . $app->getAppId() . "\n");
print("Info: " . $app->getInfo() . "\n");
print("Name: " . $app->getName() . "\n");
print("Type: " . $app->getType() . "\n");
print("Status: " . $app->getStatus() . "\n");
print("TimeCreate: " . $app->getTimeCreate() . "\n");
echo("\n");
}

爬虫示例

获取爬虫列表

下面的代码可以获取账号下的爬虫列表,参数 page 默认值是 1 ,page_size 默认值是 50 ,$params 可不传:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$params['page'] = 1;
$params['page_size'] = 10;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$crawler_list = $shenjian_client->getCrawlerList($params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
foreach ($crawler_list as $crawler){
print("Crawler AppId: " . $crawler->getAppId() . "\n");
print("Crawler Info: " . $crawler->getInfo() . "\n");
print("Crawler Name: " . $crawler->getName() . "\n");
print("Crawler Type: " . $crawler->getType() . "\n");
print("Crawler Status: " . $crawler->getStatus() . "\n");
print("Crawler TimeCreate: " . $crawler->getTimeCreate() . "\n");
echo "\n";
}

创建爬虫

下面的代码可以创建一个爬虫:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$params['app_name'] = "<爬虫名称>";
$params['app_info'] = "<爬虫简介>";
$params['code'] = base64_encode("<爬虫代码>");
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$crawler = $shenjian_client->createCrawler($params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler AppId: " . $crawler->getAppId() . "\n");
print("Crawler Name: " . $crawler->getName() . "\n");
print("Crawler Status: " . $crawler->getStatus() . "\n");
print("Crawler TimeCreate: " . $crawler->getTimeCreate());

删除爬虫

下面的代码可以删除指定爬虫:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->deleteCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("DeleteCrawler: OK");

注意:爬虫删除后,爬取结果无法恢复,请谨慎调用。

修改爬虫信息

下面的代码可以修改指定爬虫的信息,包括爬虫名和爬虫简介,参数 app_nameapp_info 都是不设置则不修改:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['app_name'] = "<爬虫应用名称>";
$params['app_info'] = "<爬虫应用描述>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->editCrawler($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("EditCrawler: OK");

获取爬虫的自定义项

下面的代码可以获取爬虫的自定义项详情:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$custom_list = $shenjian_client->configCrawlerCustomGet($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
if(is_array($custom_list)){
foreach ($custom_list as $custom){
print("Custom Key: " . $custom->getKey() . "\n");
print("Custom Name: " . $custom->getName() . "\n");
print("Custom Type: " . $custom->getType() . "\n");
print("Custom Cvalue:" . $custom->getCvalue() . "\n");
echo "\n";
}
}else{
print("Reason: " . $custom_list);
}

提示:数据返回成功时需要判断是否是数组,如果是数组则遍历获取每个自定义项的详情信息,不是数组则该爬虫没有自定义项,返回的是条string类型的说明信息。

设置爬虫的自定义项

下面的代码仅作为示例来修改爬虫的自定义设置,因为每个爬虫可设置的自定义项很可能不同,并且不同类型的自定义项,需要post的参数格式不一样:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['crawlStore'] = true;
$params['pageNum'] = 10;
$params['productUrl'] = "https://item.jd.com/3724805.php";
$params['keywords'] = ["男装","女装"];
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->configCrawlerCustom($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("ConfigCrawlerCustom: OK");

启动爬虫

下面的代码可以启动指定爬虫,参数 $params 可以不传,默认以1个节点启动;除了下面演示的可传参数,还有其它参数,详情请参考启动爬虫

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['node'] = 1;
$params['follow_new'] = true;
$params['follow_change'] = true;
$params['dup_type'] = 'change';
$params['change_type'] = 'update';
$params['timer_type'] = 'once';
$params['once_date_start'] = '2018-2-10';
$params['time_start'] = '10:00';
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->startCrawler($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);

停止爬虫

下面的代码可以停止指定爬虫:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->stopCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);

暂停爬虫

下面的代码可以暂停指定爬虫:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->pauseCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);

继续爬虫

下面的代码可以继续指定爬虫:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->resumeCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);

获取爬虫状态

下面的代码可以获取指定爬虫状态:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->getCrawlerStatus($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);

获取爬虫速率

下面的代码可以获取指定爬虫速率:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$speed = $shenjian_client->getCrawlerSpeed($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Speed: " . $speed);

修改爬虫节点

下面的代码可以增加或减少指定爬虫所使用的节点,参数:node_delta 大于0表示增加,小于0表示减少,不能为0:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['node_delta'] = 3;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$node = $shenjian_client->changeCrawlerNode($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Node Running : " . $node->getNodeRunning() . "\n");
print("Node Left : " . $node->getNodeLeft());

获取数据信息

下面的代码可以获取爬虫对应的数据源信息:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key,$user_secret);
$source = $shenjian_client->getCrawlerSource($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Source AppId: " . $source->getAppId() . "\n");
print("Source Type: " . $source->getType() . "\n");
print("Source Count: " . $source->getCount());

清空爬虫数据

下面的代码可以清空爬虫的爬取结果:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->clearCrawlerSource($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("ClearCrawlerSource: OK");

注意:爬取结果清空后无法恢复,请谨慎调用。

删除爬虫数据

下面的代码可以删除爬虫N天前爬到的数据,参数 days 无默认值,最小为1:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['days'] = 3;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->deleteCrawlerSource($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("DeleteCrawlerSource: OK");

注意:此接口调用后会立即返回,删除数据在后台进行。此操作不可取消,爬取结果删除后无法恢复,请谨慎调用。

设置爬虫托管

下面的代码可以修改指定爬虫的托管设置,参数 proxyhost_typetype 是托管类型,0-不托管,1-阿里云OSS,2-七牛云存储,4-又拍云;参数 imagetextaudioapplication 非零数字或 true 都表示托管,不传表示不托管:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['host_type'] = 3;
$params['image'] = true;
$params['text'] = true;
$params['audio'] = true;
$params['video'] = true;
$params['application'] = true;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->configCrawlerHost($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("ConfigCrawlerHost: OK");

获取Webhook设置

下面的代码可以获取爬虫的Webhook设置:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$webhook = $shenjian_client->getCrawlerWebhook($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Webhook Url: " . $webhook->getUrl() . "\n");
print("Crawler Webhook Events: " . json_encode($webhook->getEvents()));
print("Crawler Webhook Gzip: " . $webhook->getGzip());

删除Webhook

下面的代码可以删除爬虫的Webhook设置:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->deleteCrawleWebhook($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("DeleteCrawleWebhook: OK");

修改Webhook

下面的代码可以修改爬虫的Webhook设置,参数 data_newdata_updatedmsg_custom 非零数字或 true 都表示发送,不传表示不发送:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['url'] = urlencode("<webhook的通知地址,需要是能外网访问的地址>");
$params['data_new'] = true;
$params['data_updated'] = true;
$params['msg_custom'] = true;
$params['gzip'] = true;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->setCrawlerWebhook($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("SetCrawlerWebhook: OK");

获取爬虫自动发布状态

下面的代码可以获取该爬虫对应数据源的自动发布状态:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$publish_status = $shenjian_client->getCrawlerPublishStatus($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Publish Status: " . $publish_status);

开启自动发布

下面的代码可以开启该爬虫对应数据源的自动发布,参数 publish_id 是发布项ID(发布项目前只能通过网页创建,暂时不开放通过接口创建):

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['publish_id'] = ["<发布项目的ID>", "<发布项目的ID>", "<发布项目的ID>"];
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->startCrawlerPublish($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("StartCrawlerPublish: OK");

停止自动发布

下面的代码可以停止该爬虫对应数据源的自动发布:

$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->stopCrawlerPublish($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("StopCrawlerPublish: OK");

代码许可

Copyright (c) 2020 网多软件科技

基于 Apache 协议发布: