前言

前几天公司生产环境一个服务由于流量上升触发了Sentinel的流控机制,然后用户反馈访问慢,定位发现是task定时任务导致,后面task优化之后发布,流量恢复正常。

这是一个再正常不过的生产问题,可能大部分同学都经历过,经历过的大多数是解决问题之后就不了了之,导致事故又再次发生的可能,最终对用户造成了不好的体验。所以我觉得所有的生产问题都需要进行复盘,当然复盘的目的不是为了追责,而是防止再次再发生同样的错误。那我们就简单分析一下这个问题,首先肯定是业务规模的疏漏导致任务发出不合理的大量请求,其二我们的流控只是简单粗暴的流控,没有更好的预测措施,导致影响到用户之后我们才知道(即流控或熔断已经触发)。

那我们的解决方案呢?首先肯定是业务层面的预防,而不是这里要说的重点,这里不展开讨论了。其次就是预测,就是我们能否在快要触发流控之前知道,然后报警到相关负责人提前干预处理,防止触发流控熔断。当然也不能完全避免,但是总比流控或熔断触发之后在报警要好。

由于之前流控用的阿里的Sentinel,所以本文介绍的具体实现是用Sentinel的自定义slot功能,这个自定义slot卡槽在Sentinel官方文档里面就一句话带过,然后再加上一个demo代码,我在使用的过程中也遇到过过多坑,所以分享一下结果给大家。

如果大家对Sentinel不是很了解,可以先去github先了解简单试用一下在阅读这里。github地址:https : //github.com/alibaba/Sentinel [1]

如果想熟悉自定义slot功能建议了解一下Sentinel的工作原理:https : //github.com/alibaba/Sentinel/wiki/Sentinel%E5%B7%A5%E4%BD%9C%E4%B8%BB%E6 %B5%81%E7%A8%8B [2]

还有源码中的demo对于自定义插槽的写法:https : //github.com/alibaba/Sentinel/tree/master/sentinel-demo/sentinel-demo-slot-chain-spi [3]

具体实现

下面介绍下Sentinel预警功能的相关实现,使用的正确是你的系统已经在用Sentinel的流控或熔断等功能。

  1. 自定义CustomSlotChainBuilder实现SlotChainBuilder接口,这里主要是把我们自定义的Slot加到SlotChain这个链中
import com.alibaba.csp.sentinel.slotchain.ProcessorSlotChain;
import com.alibaba.csp.sentinel.slotchain.SlotChainBuilder;
import com.alibaba.csp.sentinel.slots.DefaultSlotChainBuilder;
import com.qiaofang.tortoise.gateway.component.ApplicationContextUtil;
import com.qiaofang.tortoise.gateway.config.SentinelProperties;
import org.springframework.stereotype.Component;
 
import javax.annotation.Resource;
 
/**
 * 自定义slot
 *
 * @author chenhao
 */
public class CustomSlotChainBuilder implements SlotChainBuilder {
    @Override
    public ProcessorSlotChain build() {
        ProcessorSlotChain chain = new DefaultSlotChainBuilder().build();
        SentinelProperties sentinelProperties = (SentinelProperties) ApplicationContextUtil.getContext().getBean("sentinelProperties");
        chain.addLast(new FlowEarlyWarningSlot(sentinelProperties));
        chain.addLast(new DegradeEarlyWarningSlot(sentinelProperties));
        return chain;
    }
}

2.自定义FlowEarlyWarningSlot,DegradeEarlyWarningSlot流控熔断2个预警槽

自定义FlowEarlyWarningSlot

import com.alibaba.csp.sentinel.context.Context;
import com.alibaba.csp.sentinel.node.DefaultNode;
import com.alibaba.csp.sentinel.slotchain.AbstractLinkedProcessorSlot;
import com.alibaba.csp.sentinel.slotchain.ResourceWrapper;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRule;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleChecker;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleUtil;
import com.alibaba.csp.sentinel.util.AssertUtil;
import com.google.common.collect.Lists;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.BeanUtils;
import org.springframework.util.CollectionUtils;
 
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
 
/**
 * 流控预警slot
 *
 * @author chenhao
 */
public class FlowEarlyWarningSlot2 extends AbstractLinkedProcessorSlot<DefaultNode> {
 
    /**
     * log
     */
    private Logger logger = LoggerFactory.getLogger(this.getClass());
 
    private final FlowRuleChecker checker;
 
    public FlowEarlyWarningSlot2() {
        this(new FlowRuleChecker());
    }
 
    /**
     * Package-private for test.
     *
     * @param checker flow rule checker
     * @since 1.6.1
     */
    FlowEarlyWarningSlot2(FlowRuleChecker checker) {
        AssertUtil.notNull(checker, "flow checker should not be null");
        this.checker = checker;
    }
 
 
    private List<FlowRule> getRuleProvider(String resource) {
        // Flow rule map should not be null.
        List<FlowRule> rules = FlowRuleManager.getRules();
        List<FlowRule> earlyWarningRuleList = Lists.newArrayList();
        for (FlowRule rule : rules) {
            FlowRule earlyWarningRule = new FlowRule();
            BeanUtils.copyProperties(rule, earlyWarningRule);
            /**
             * 这里是相当于把规则阈值改成原来的80%,达到提前预警的效果,
             * 这里建议把0.8做成配置
             */
            earlyWarningRule.setCount(rule.getCount() * 0.8);
            earlyWarningRuleList.add(earlyWarningRule);
        }
        Map<String, List<FlowRule>> flowRules = FlowRuleUtil.buildFlowRuleMap(earlyWarningRuleList);
        return flowRules.get(resource);
    }
 
    /**
     * get origin rule
     *
     * @param resource
     * @return
     */
    private FlowRule getOriginRule(String resource) {
        List<FlowRule> originRule = FlowRuleManager.getRules().stream().filter(flowRule -> flowRule.getResource().equals(resource)).collect(Collectors.toList());
        if (CollectionUtils.isEmpty(originRule)) {
            return null;
        }
        return originRule.get(0);
    }
 
    @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count, boolean prioritized, Object... args)
            throws Throwable {
        String resource = context.getCurEntry().getResourceWrapper().getName();
        List<FlowRule> rules = getRuleProvider(resource);
        if (rules != null) {
            for (FlowRule rule : rules) {
                //这里取到的规则都是配置阈值的80%,这里如果检查到阈值了,说明就是到了真实阈值的80%,既可以发报警给对应负责人了
                if (!checker.canPassCheck(rule, context, node, count, prioritized)) {
                    FlowRule originRule = getOriginRule(resource);
                    String originRuleCount = originRule == null ? "未知" : String.valueOf(originRule.getCount());
                    logger.info("FlowEarlyWarning:服务{}目前的流量指标已经超过{},接近配置的流控阈值:{},", resource, rule.getCount(), originRuleCount);
                    //TODO 报警功能自行实现
                    break;
                }
            }
        }
        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }
 
    @Override
    public void exit(Context context, ResourceWrapper resourceWrapper, int count, Object... args) {
        fireExit(context, resourceWrapper, count, args);
    }
}

降级预警时段

导入com.alibaba.csp.sentinel.context.Context;
导入com.alibaba.csp.sentinel.node.DefaultNode;
导入com.alibaba.csp.sentinel.slotchain.AbstractLinkedProcessorSlot;
导入com.alibaba.csp.sentinel.slotchain.ResourceWrapper;
导入com.alibaba.csp.sentinel.slots.block.degrade.DegradeRule;
导入com.alibaba.csp.sentinel.slots.block.degrade.DegradeRuleManager;
导入com.alibaba.csp.sentinel.util.AssertUtil;
导入com.google.common.collect.Lists;
导入com.qiaofang.tortoise.gateway.config.SentinelProperties;
导入org.slf4j.Logger;
导入org.slf4j.LoggerFactory;
导入org.springframework.beans.BeanUtils;
导入org.springframework.util.CollectionUtils;

导入java.util.List;
导入java.util.stream.Collectors;

/ **
 *熔断预报插槽
 *
 * @作者chenhao
 * /
公共类DegradeEarlyWarningSlot2扩展了AbstractLinkedProcessorSlot <DefaultNode> {

    / **
     *日志
     * /
    私有Logger logger = LoggerFactory.getLogger(this.getClass());

    / **
     *与流控基本一致就是取原规则的方式不一样
     * @参数资源
     * @返回
     * /
    私人List <DegradeRule> getRuleProvider(字符串资源){
        //流规则图不应为null。
        List <DegradeRule>规则= DegradeRuleManager.getRules();
        List <DegradeRule> earlyWarningRuleList = Lists.newArrayList();
        对于(DegradeRule规则:规则){
            DegradeRule earlyWarningRule =新的DegradeRule();
            BeanUtils.copyProperties(rule,earlyWarningRule);
            earlyWarningRule.setCount(rule.getCount()* 0.8);
            earlyWarningRuleList.add(earlyWarningRule);
        }
        返回EarlyWarningRuleList.stream()。filter(rule-> resource.equals(rule.getResource()))。collect(Collectors.toList());
    }

    / **
     *获取原产地规则
     *
     * @参数资源
     * @返回
     * /
    私人DegradeRule getOriginRule(字符串资源){
        List <DegradeRule> originRule = DegradeRuleManager.getRules()。stream()。filter(rule-> rule.getResource()。equals(resource))。collect(Collectors.toList());
        如果(CollectionUtils.isEmpty(originRule)){
            返回null;
        }
        返回originRule.get(0);
    }

    @Override
    公共无效条目(上下文上下文,ResourceWrapper resourceWrapper,DefaultNode节点,整数计数,布尔值优先级,对象... args)
            抛出Throwable {
        字符串资源= context.getCurEntry()。getResourceWrapper()。getName();
        List <DegradeRule>规则= getRuleProvider(资源);
        if(rules!= null){
            对于(DegradeRule规则:规则){
                if(!rule.passCheck(context,node,count)){
                    DegradeRule originRule = getOriginRule(资源);
                    字符串originRuleCount = originRule == null吗?“未知”:String.valueOf(originRule.getCount());
                    logger.info(“ DegradeEarlyWarning:service {}目前的熔断指标已经超过{},接近配置的熔断阈值:{},”,资源,rule.getCount(),originRuleCount);
                    打破;
                }
            }
        }
        fireEntry(上下文,resourceWrapper,节点,计数,优先级,args);
    }

    @Override
    public void exit(Context context,ResourceWrapper resourceWrapper,int count,Object ... args){
        fireExit(context,resourceWrapper,count,args);
    }
}

3.在资源文件夹下面添加META-INF.services文件夹,添加文件com.alibaba.csp.sentinel.slotchain.SlotChainBuilder(文件名无所谓)内容如下

#这里写你CustomSlotChainBuilder的完整包路径
com.xxx.sentinel.CustomSlotChainBuilder

到这里基本上就可以了,用的过程中还是遇到挺多坑的,简单双重几个吧

  • 直接改FlowRule的count属性是不行的,因为反复验证规则的时候用的是FlowRule的控制器属性,这个属性又是私有的,所以直接先拿到原始的配置后通过FlowRuleUtil重新生成
  • 调试过程中,DefaultNode里面很多方法的值都是1s内有效,从方法A调试到方法B可能值就没了,当时一脸懵逼

写在最后

本人很少写这种技术博客,所以有什么问题,或者不严谨的地方,大家可以提出来,求轻点喷我哈哈哈

参考资料

[1]

https://github.com/alibaba/Sentinel:https://github.com/alibaba/Sentinel[2]

https://github.com/alibaba/Sentinel/wiki/Sentinel%E5%B7%A5%E4%BD%9C%E4%B8%BB%E6%B5%81%E7%A8%8B:https://开头github.com/alibaba/Sentinel/wiki/Sentinel%E5%B7%A5%E4%BD%9C%E4%B8%BB%E6%B5%81%E7%A8%8B[3]

https://github.com/alibaba/Sentinel/tree/master/sentinel-demo/sentinel-demo-slot-chain-spi:https://github.com/alibaba/Sentinel/tree/master/sentinel-demo/哨兵演示插槽链SPI

Comments are closed.