SDK下载
环境准备
- PHP 5.5+,可通过
php -v
命令查看当前的PHP版本。 - cURL 扩展,可通过
php -m
命令查看curl扩展是否已经安装好。
安装
有两种方式安装SDK:
- composer方式
- 源码方式
composer方式
您可以通过composer安装您的项目依赖,需要您在项目的根目录运行:
composer require shenjian/shenjian-sdk-php
或者在您的
composer.json
中声明对Shenjian SDK for PHP的依赖:{
"require": {
"shenjian/shenjian-sdk-php": "~1.0"
}
}通过composer install安装依赖,安装完成后引入依赖:
require_once __DIR__ . '/vendor/autoload.php';
注意:
- 如果您的项目中已经引用过
autoload.php
,则加入了SDK的依赖之后,不需要再引入autoload.php
了。 - 如果使用composer出现网络错误,可以使用composer中国区的镜像源,方法是在命令行执行:
composer config -g repositories.packagist composer http://packagist.phpcomposer.com
或参照镜像用法。
源码方式
- 使用SDK源码时,您需要在发布页面中选择相应版本并下载打包好的zip文件。
解压后的根目录中包含一个
autoload.php
文件,您需要在代码中引入这个文件:require_once '/path/to/oss-sdk/autoload.php';
基本使用
安装好 SDK 后,接下来介绍如何使用 SDK。在使用 SDK 之前,需要有一个网多云账号,并登录用户中心获取一对有效的UserKey和UserSecret。
获取应用列表
下面的代码可以获取应用列表:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$app_list = $shenjian_client->getAppList();
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
foreach ($app_list as $app){
print("AppId: " . $app->getAppId() . "\n");
print("Info: " . $app->getInfo() . "\n");
print("Name: " . $app->getName() . "\n");
print("Type: " . $app->getType() . "\n");
print("Status: " . $app->getStatus() . "\n");
print("TimeCreate: " . $app->getTimeCreate() . "\n");
echo("\n");
}
账户示例
获取账户余额
下面的代码可以获取账户余额,余额是float类型并保留两位小数:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$balance = $shenjian_client->getMoney();
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Balance: " . $balance);
获取node信息
下面的代码可以获取用户节点的总数和正在运行的节点数:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$node = $shenjian_client->getNode();
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Node All: " . $node->getNodeAll() . "\n");
print("Node Running: " . $node->getNodeRunning() . "\n");
print("Node Gpu All: " . $node->getNodeGpuAll() . "\n");
print("Node Gpu Running: " . $node->getNodeGpuRunning());
应用示例
获取应用列表
下面的代码可以获取账号下的所有类型应用,参数 page
默认值是 1 ,page_size
默认值是 50 ,$params
可不传:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$params['page'] = 1;
$params['page_size'] = 10;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$app_list = $shenjian_client->getAppList($params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
foreach ($app_list as $app){
print("AppId: " . $app->getAppId() . "\n");
print("Info: " . $app->getInfo() . "\n");
print("Name: " . $app->getName() . "\n");
print("Type: " . $app->getType() . "\n");
print("Status: " . $app->getStatus() . "\n");
print("TimeCreate: " . $app->getTimeCreate() . "\n");
echo("\n");
}
爬虫示例
获取爬虫列表
下面的代码可以获取账号下的爬虫列表,参数 page
默认值是 1 ,page_size
默认值是 50 ,$params
可不传:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$params['page'] = 1;
$params['page_size'] = 10;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$crawler_list = $shenjian_client->getCrawlerList($params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
foreach ($crawler_list as $crawler){
print("Crawler AppId: " . $crawler->getAppId() . "\n");
print("Crawler Info: " . $crawler->getInfo() . "\n");
print("Crawler Name: " . $crawler->getName() . "\n");
print("Crawler Type: " . $crawler->getType() . "\n");
print("Crawler Status: " . $crawler->getStatus() . "\n");
print("Crawler TimeCreate: " . $crawler->getTimeCreate() . "\n");
echo "\n";
}
创建爬虫
下面的代码可以创建一个爬虫:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$params['app_name'] = "<爬虫名称>";
$params['app_info'] = "<爬虫简介>";
$params['code'] = base64_encode("<爬虫代码>");
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$crawler = $shenjian_client->createCrawler($params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler AppId: " . $crawler->getAppId() . "\n");
print("Crawler Name: " . $crawler->getName() . "\n");
print("Crawler Status: " . $crawler->getStatus() . "\n");
print("Crawler TimeCreate: " . $crawler->getTimeCreate());
删除爬虫
下面的代码可以删除指定爬虫:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->deleteCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("DeleteCrawler: OK");
注意:爬虫删除后,爬取结果无法恢复,请谨慎调用。
修改爬虫信息
下面的代码可以修改指定爬虫的信息,包括爬虫名和爬虫简介,参数 app_name
和 app_info
都是不设置则不修改:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['app_name'] = "<爬虫应用名称>";
$params['app_info'] = "<爬虫应用描述>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->editCrawler($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("EditCrawler: OK");
获取爬虫的自定义项
下面的代码可以获取爬虫的自定义项详情:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$custom_list = $shenjian_client->configCrawlerCustomGet($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
if(is_array($custom_list)){
foreach ($custom_list as $custom){
print("Custom Key: " . $custom->getKey() . "\n");
print("Custom Name: " . $custom->getName() . "\n");
print("Custom Type: " . $custom->getType() . "\n");
print("Custom Cvalue:" . $custom->getCvalue() . "\n");
echo "\n";
}
}else{
print("Reason: " . $custom_list);
}
提示:数据返回成功时需要判断是否是数组,如果是数组则遍历获取每个自定义项的详情信息,不是数组则该爬虫没有自定义项,返回的是条string类型的说明信息。
设置爬虫的自定义项
下面的代码仅作为示例来修改爬虫的自定义设置,因为每个爬虫可设置的自定义项很可能不同,并且不同类型的自定义项,需要post的参数格式不一样:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['crawlStore'] = true;
$params['pageNum'] = 10;
$params['productUrl'] = "https://item.jd.com/3724805.php";
$params['keywords'] = ["男装","女装"];
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->configCrawlerCustom($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("ConfigCrawlerCustom: OK");
启动爬虫
下面的代码可以启动指定爬虫,参数 $params
可以不传,默认以1个节点启动;除了下面演示的可传参数,还有其它参数,详情请参考启动爬虫:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['node'] = 1;
$params['follow_new'] = true;
$params['follow_change'] = true;
$params['dup_type'] = 'change';
$params['change_type'] = 'update';
$params['timer_type'] = 'once';
$params['once_date_start'] = '2018-2-10';
$params['time_start'] = '10:00';
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->startCrawler($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);
停止爬虫
下面的代码可以停止指定爬虫:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->stopCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);
暂停爬虫
下面的代码可以暂停指定爬虫:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->pauseCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);
继续爬虫
下面的代码可以继续指定爬虫:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->resumeCrawler($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);
获取爬虫状态
下面的代码可以获取指定爬虫状态:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$status = $shenjian_client->getCrawlerStatus($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Status: " . $status);
获取爬虫速率
下面的代码可以获取指定爬虫速率:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$speed = $shenjian_client->getCrawlerSpeed($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Speed: " . $speed);
修改爬虫节点
下面的代码可以增加或减少指定爬虫所使用的节点,参数:node_delta
大于0表示增加,小于0表示减少,不能为0:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['node_delta'] = 3;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$node = $shenjian_client->changeCrawlerNode($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Node Running : " . $node->getNodeRunning() . "\n");
print("Node Left : " . $node->getNodeLeft());
获取数据信息
下面的代码可以获取爬虫对应的数据源信息:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key,$user_secret);
$source = $shenjian_client->getCrawlerSource($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Source AppId: " . $source->getAppId() . "\n");
print("Source Type: " . $source->getType() . "\n");
print("Source Count: " . $source->getCount());
清空爬虫数据
下面的代码可以清空爬虫的爬取结果:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->clearCrawlerSource($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("ClearCrawlerSource: OK");
注意:爬取结果清空后无法恢复,请谨慎调用。
删除爬虫数据
下面的代码可以删除爬虫N天前爬到的数据,参数 days
无默认值,最小为1:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['days'] = 3;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->deleteCrawlerSource($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("DeleteCrawlerSource: OK");
注意:此接口调用后会立即返回,删除数据在后台进行。此操作不可取消,爬取结果删除后无法恢复,请谨慎调用。
设置爬虫托管
下面的代码可以修改指定爬虫的托管设置,参数 proxyhost_typetype
是托管类型,0-不托管,1-阿里云OSS,2-七牛云存储,4-又拍云;参数 image
、 text
、 audio
和 application
非零数字或 true 都表示托管,不传表示不托管:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['host_type'] = 3;
$params['image'] = true;
$params['text'] = true;
$params['audio'] = true;
$params['video'] = true;
$params['application'] = true;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->configCrawlerHost($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("ConfigCrawlerHost: OK");
获取Webhook设置
下面的代码可以获取爬虫的Webhook设置:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$webhook = $shenjian_client->getCrawlerWebhook($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Crawler Webhook Url: " . $webhook->getUrl() . "\n");
print("Crawler Webhook Events: " . json_encode($webhook->getEvents()));
print("Crawler Webhook Gzip: " . $webhook->getGzip());
删除Webhook
下面的代码可以删除爬虫的Webhook设置:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->deleteCrawleWebhook($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("DeleteCrawleWebhook: OK");
修改Webhook
下面的代码可以修改爬虫的Webhook设置,参数 data_new
、 data_updated
和 msg_custom
非零数字或 true 都表示发送,不传表示不发送:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['url'] = urlencode("<webhook的通知地址,需要是能外网访问的地址>");
$params['data_new'] = true;
$params['data_updated'] = true;
$params['msg_custom'] = true;
$params['gzip'] = true;
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->setCrawlerWebhook($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("SetCrawlerWebhook: OK");
获取爬虫自动发布状态
下面的代码可以获取该爬虫对应数据源的自动发布状态:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$publish_status = $shenjian_client->getCrawlerPublishStatus($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("Publish Status: " . $publish_status);
开启自动发布
下面的代码可以开启该爬虫对应数据源的自动发布,参数 publish_id
是发布项ID(发布项目前只能通过网页创建,暂时不开放通过接口创建):$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
$params['publish_id'] = ["<发布项目的ID>", "<发布项目的ID>", "<发布项目的ID>"];
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->startCrawlerPublish($app_id, $params);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("StartCrawlerPublish: OK");
停止自动发布
下面的代码可以停止该爬虫对应数据源的自动发布:$user_key = "<您账号的UserKey>";
$user_secret = "<您账号的UserSecret>";
$app_id = "<爬虫ID>";
try{
$shenjian_client = new ShenjianClient($user_key, $user_secret);
$shenjian_client->stopCrawlerPublish($app_id);
}catch (ShenjianException $e){
printf($e->getMessage() . "\n");
return;
}
print("StopCrawlerPublish: OK");
代码许可
Copyright (c) 2020 网多软件科技
基于 Apache 协议发布: