ByteCodeDL 學習

語言: CN / TW / HK

ByteCodeDL也是一款java位元組碼靜態分析工具,它藉助了 soot-fact-generator + Souffle 兩個工具實現了一款宣告式的靜態分析工具。

宣告式和命令式概念參見 http://www.aqee.net/post/imperative-vs-declarative/

其中soot-fact-generator的作用在於為souffle生成fact事實,也就是生成資料集。

souffle根據facts以及我們給定的規則語句(以 .dl 為字尾的檔案)來進行查詢。

ByteCodeDL為我們編寫了一些已經寫好的規則,比如callgraph/cha(Class hierarchy analysis)/PTA指標分析及 P/Taint 汙點分析等,可以根據自己需求編寫相應dl檔案實現靜態分析。規則檔案移步 http://github.com/BytecodeDL/ByteCodeDL/tree/main/logic

關於Datalog-Based Program Analysis這部分的原理應該先看李樾和譚添老師的ppt http://pascal-group.bitbucket.io/lectures/Datalog.pdf

本文只是根據文件過一遍,帶讀者簡單瞭解bytecodedl。

環境

安裝souffle 見http://souffle-lang.github.io/install

sudo wget http://souffle-lang.github.io/ppa/souffle-key.public -O /usr/share/keyrings/souffle-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/souffle-archive-keyring.gpg] http://souffle-lang.github.io/ppa/ubuntu/ stable main" | sudo tee /etc/apt/sources.list.d/souffle.list
sudo apt update
sudo apt install souffle

然後下載BytecodeDL打包好的 soot-fact-generator.jar

souffle demo

以官方給的例子來看 http://souffle-lang.github.io/simple

給定一個edge.facts如下

再給定一個example.dl

.decl edge(x:number, y:number)
.input edge

.decl path(x:number, y:number)
.output path

path(x, y) :- edge(x, y).
path(x, y) :- path(x, z), edge(z, y).

其中兩個 .decl 分別表示input、output傳入傳出關係,這表示從磁碟讀入edge.facts並將path.csv結果集寫入磁碟。

path(x, y) :- edge(x, y). 表示:如果存在x->y的一條edge邊,那麼就存在x->y的一條path路徑。

path(x, y) :- path(x, z), edge(z, y). 則表示:如果x到z有條路徑,並且z到y有條邊,那麼就可以推理出x到y也有路徑。

我們使用souffle查詢一下看看結果。

ubuntu@ubuntu:~$ cat edge.facts
1       2
2       3
ubuntu@ubuntu:~$ souffle -F. -D. example.dl
ubuntu@ubuntu:~$ cat path.csv
1	2
2	3
1	3

輸出了三條路徑

這是最簡單的一個demo,而soot-fact-generator則是用來生成facts的。

soot-fact-generator

ByteCodeDL提供的soot-fact-generator是來自於另一個靜態分析框架 http://bitbucket.org/yanniss/doop/src/master/generators/

doop本身就是使用souffle來做Java Pointer and Taint Analysis的工具,並且其本身有一些分析規則http://bitbucket.org/yanniss/doop/src/master/souffle-logic/

ByteCodeDL將doop的generator提取了出來,用doop的程式碼來生成facts。

然後自己寫規則實現功能。

這節我們用 http://github.com/BytecodeDL/Benchmark 來生成facts資料集。

下載http://github.com/BytecodeDL/Benchmark 然後maven package。

執行soot-fact-generator

ubuntu@ubuntu:~$ java -jar soot-fact-generator.jar -i Benchmark-1.0-SNAPSHOT.jar -l /usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/lib/rt.jar --generate-jimple --allow-phantom --full -d out
No logs directory set, using: out/logs
Logging initialized, using directory: out/logs
WARNING: 'file.encoding' property missing or not UTF8, please pass: -Dfile.encoding=UTF-8
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
WARNING: SSA not enabled, generating Jimple instead of Shimple
Preprocessing application: Benchmark-1.0-SNAPSHOT.jar
Preprocessing platform library: /usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/lib/rt.jar
Adding archive: Benchmark-1.0-SNAPSHOT.jar
Adding archive for resolving: /usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/lib/rt.jar
Classes in input (application) jar(s): 85
Total classes in Scene: 3695
Retrieved all bodies (time: 11)
Fact generation cores: 16
WARNING: some classes were not resolved, consider using thorough fact generation or adding them manually via --also-resolve: [sun.util.locale.provider.HostLocaleProviderAdapterImpl, java.lang.annotation.Inherited]
Found 74 phantom references. Rerun with '--report-phantoms' for more details.
Total classes (application, dependencies and SDK) to generate Jimple for: 3695
Soot: hierarchy_dirs set.
Methods without active bodies encountered (and reset): 0

在out目錄下會生成facts檔案

ubuntu@ubuntu:~$ find out/*.facts
out/Activity.facts
out/ActualParam.facts
out/AndroidApplication.facts
out/AndroidCallbackMethodName.facts
out/AndroidEntryPoint.facts
out/AndroidId.facts
out/AndroidIncludeXML.facts
out/AnnotationElement.facts
out/ApplicationClass.facts
out/ApplicationPackage.facts
out/ArrayAllocationConstSize.facts
out/ArrayAllocation.facts
out/ArrayInitialValueFromConst.facts
out/ArrayInitialValueFromLocal.facts
out/ArrayInsnIndex.facts
out/ArrayNumIndex.facts
out/ArrayType.facts
out/AssignBinop.facts
out/AssignCast.facts
out/AssignCastNull.facts
out/AssignCastNumConstant.facts
out/AssignHeapAllocation.facts
out/AssignInstanceOf.facts
out/AssignLocal.facts
out/AssignNull.facts
out/AssignNumConstant.facts
out/AssignOperFromConstant.facts
out/AssignOperFrom.facts
out/AssignPhantomInvoke.facts
out/AssignReturnValue.facts
out/AssignUnop.facts
out/BootstrapParam.facts
out/BreakpointStmt.facts
out/BroadcastReceiver.facts
out/Class-Artifact.facts
out/ClassHeap.facts
out/ClassModifier.facts
out/ClassType.facts
out/ComponentType.facts
out/ContentProvider.facts
out/DexInstructionAddressMap.facts
out/DirectSuperclass.facts
out/DirectSuperinterface.facts
out/DummyIfVar.facts
out/DynamicMethodInvocation.facts
out/DynamicMethodInvocation-ParamType.facts
out/EmptyArray.facts
out/EnterMonitor.facts
out/ExceptionHandler.facts
out/ExceptionHandler-FormalParam.facts
out/ExceptionHandler-Previous.facts
out/ExitMonitor.facts
out/Field-Annotation.facts
out/Field.facts
out/FieldInitialValue.facts
out/Field-Modifier.facts
out/FormalParam.facts
out/GenericField.facts
out/GenericType-ErasedType.facts
out/GenericTypeParameters.facts
out/Goto.facts
out/IfConstant.facts
out/If.facts
out/IfVar.facts
out/InterfaceType.facts
out/LayoutControl.facts
out/LoadArrayIndex.facts
out/LoadInstanceField.facts
out/LoadStaticField.facts
out/LookupSwitch-Default.facts
out/LookupSwitch.facts
out/LookupSwitch-Target.facts
out/Method-Annotation.facts
out/Method-DeclaresException.facts
out/Method.facts
out/MethodHandleConstant.facts
out/MethodInvocation-Line.facts
out/Method-Modifier.facts
out/MethodTypeConstant.facts
out/MethodTypeConstantParam.facts
out/NativeLibEntryPoint.facts
out/NativeMethodId.facts
out/NativeMethodTypeCandidate.facts
out/NativeNameCandidate.facts
out/NativeReturnVar.facts
out/NativeXRef.facts
out/NormalHeap.facts
out/NumConstantRaw.facts
out/OperatorAt.facts
out/Param-Annotation.facts
out/PhantomBasedMethod.facts
out/PhantomMethod.facts
out/PhantomType.facts
out/PolymorphicInvocation.facts
out/Properties.facts
out/Return.facts
out/ReturnVoid.facts
out/SensitiveLayoutControl.facts
out/Service.facts
out/SpecialMethodInvocation.facts
out/StatementType.facts
out/StaticMethodInvocation.facts
out/StoreArrayIndex.facts
out/StoreInstanceField.facts
out/StoreStaticField.facts
out/StringConstant.facts
out/StringRaw.facts
out/SuperMethodInvocation.facts
out/TableSwitch-Default.facts
out/TableSwitch.facts
out/TableSwitch-Target.facts
out/ThisVar.facts
out/Throw.facts
out/ThrowNull.facts
out/Type-Annotation.facts
out/Type-SimpleName.facts
out/UnsupportedInstruction.facts
out/Var-DeclaringMethod.facts
out/Var-SimpleName.facts
out/Var-Type.facts
out/VirtualMethodInvocation.facts
out/XMLNodeAttribute.facts
out/XMLNodeData.facts
out/XMLNode.facts

其中每個facts檔案對應了不同的關係,比如Method.facts

ubuntu@ubuntu:~$ cat out/Method.facts|head -10
<java.net.ProxySelector: void <init>()> <init>          java.net.ProxySelector  void    ()V     0
<java.lang.invoke.MethodHandleImpl$CountingWrapper: void <init>(java.lang.invoke.MethodHandle,java.lang.invoke.LambdaForm,java.util.function.Function,java.util.function.Function,int)> <init>  java.lang.invoke.MethodHandle,java.lang.invoke.LambdaForm,java.util.function.Function,java.util.function.Function,int    java.lang.invoke.MethodHandleImpl$CountingWrapper       void    (Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/LambdaForm;Ljava/util/function/Function;Ljava/util/function/Function;I)V       5
<sun.text.normalizer.UBiDiProps$IsAcceptable: void <init>(sun.text.normalizer.UBiDiProps)>      <init>  sun.text.normalizer.UBiDiProps  sun.text.normalizer.UBiDiProps$IsAcceptable     void    (Lsun/text/normalizer/UBiDiProps;)V      1
<java.lang.UNIXProcess$Platform: java.lang.UNIXProcess$Platform[] values()>     values          java.lang.UNIXProcess$Platform  java.lang.UNIXProcess$Platform[]        ()[Ljava/lang/UNIXProcess$Platform;     0
<sun.invoke.util.VerifyAccess: void <init>()>   <init>          sun.invoke.util.VerifyAccess    void    ()V     0
<java.util.WeakHashMap$KeySpliterator: void <init>(java.util.WeakHashMap,int,int,int,int)>      <init>  java.util.WeakHashMap,int,int,int,int   java.util.WeakHashMap$KeySpliterator    void    (Ljava/util/WeakHashMap;IIII)V   5
<java.util.stream.Tripwire: void <init>()>      <init>          java.util.stream.Tripwire       void    ()V     0
<java.util.BitSet: int wordIndex(int)>  wordIndex       int     java.util.BitSet        int     (I)I    1
<sun.invoke.util.VerifyAccess: boolean isMemberAccessible(java.lang.Class,java.lang.Class,int,java.lang.Class,int)>     isMemberAccessible      java.lang.Class,java.lang.Class,int,java.lang.Class,int sun.invoke.util.VerifyAccess     boolean (Ljava/lang/Class;Ljava/lang/Class;ILjava/lang/Class;I)Z        5
<java.net.ProxySelector: java.net.ProxySelector getDefault()>   getDefault              java.net.ProxySelector  java.net.ProxySelector  ()Ljava/net/ProxySelector;      0

facts預設用 \t 做分隔符,抽出一行來看

<sun.invoke.util.VerifyAccess: boolean isMemberAccessible(java.lang.Class,java.lang.Class,int,java.lang.Class,int)>     isMemberAccessible      java.lang.Class,java.lang.Class,int,java.lang.Class,int sun.invoke.util.VerifyAccess     boolean (Ljava/lang/Class;Ljava/lang/Class;ILjava/lang/Class;I)Z        5

這行對應 sun.invoke.util.VerifyAccess#isMemberAccessible ,以 \t 分隔每一列又對應到函式的不同屬性。

再者說MethodInvocation-Line.facts

ubuntu@ubuntu:~$ cat out/MethodInvocation-Line.facts |head -10
<sun.invoke.util.VerifyAccess: void <init>()>/java.lang.Object.<init>/0 38
<java.net.ProxySelector: void <init>()>/java.lang.Object.<init>/0       60
<java.util.stream.Tripwire: void <init>()>/java.lang.Object.<init>/0    55
<java.util.WeakHashMap$KeySpliterator: void <init>(java.util.WeakHashMap,int,int,int,int)>/java.util.WeakHashMap$WeakHashMapSpliterator.<init>/0        1102
<sun.text.normalizer.UBiDiProps$IsAcceptable: void <init>(sun.text.normalizer.UBiDiProps)>/java.lang.Object.<init>/0    107
<java.lang.UNIXProcess$Platform: java.lang.UNIXProcess$Platform[] values()>/java.lang.Object.clone/0    81
<java.net.ProxySelector: java.net.ProxySelector getDefault()>/java.lang.System.getSecurityManager/0     92
<java.lang.invoke.MethodHandleImpl$CountingWrapper: void <init>(java.lang.invoke.MethodHandle,java.lang.invoke.LambdaForm,java.util.function.Function,java.util.function.Function,int)>/java.lang.invoke.MethodHandle.type/0     810
<java.lang.invoke.MethodHandleImpl$CountingWrapper: void <init>(java.lang.invoke.MethodHandle,java.lang.invoke.LambdaForm,java.util.function.Function,java.util.function.Function,int)>/java.lang.invoke.DelegatingMethodHandle.<init>/0 810
<java.net.ProxySelector: java.net.ProxySelector getDefault()>/java.lang.SecurityManager.checkPermission/0       94

記錄了method call的行號。

那麼通過facts檔案我們就有了靜態軟體分析中所需要的東西。

寫規則

soot-fact-generator.jar 為我們提供了各種所需要的結果集

我們一步一步來實現靜態分析。

1 實現Class Hierarchy

從基本的Class Hierarchy開始,我們需要構建一個型別層次圖,用於尋找某個類的子類、父類,或者用於判斷兩個類之間是否有繼承關係。

generator為我們生成了facts結果集

ubuntu@ubuntu:~$ cat out/DirectSuperclass.facts |head -10
java.lang.UNIXProcess$Platform  java.lang.Enum
sun.text.normalizer.UBiDiProps$IsAcceptable     java.lang.Object
java.net.ProxySelector  java.lang.Object
java.lang.invoke.MethodHandleImpl$CountingWrapper       java.lang.invoke.DelegatingMethodHandle
java.util.WeakHashMap$KeySpliterator    java.util.WeakHashMap$WeakHashMapSpliterator
java.util.BitSet        java.lang.Object
sun.invoke.util.VerifyAccess    java.lang.Object
java.util.stream.Tripwire       java.lang.Object
sun.text.normalizer.ReplaceableString   java.lang.Object
java.net.StandardSocketOptions  java.lang.Object

可以看到具體的繼承關係。extend對應的是DirectSuperclass.facts,implement對應的是DirectSuperinterface.facts。

.type Class <: symbol

.decl DirectSuperclass(child:Class, parent:Class)
.input DirectSuperclass

.decl DirectSuperinterface(child:Class, parent:Class)
.input DirectSuperinterface

遞迴判斷子類關係

.type Class <: symbol

.decl ClassModifier(mod:symbol, class:Class)
.input ClassModifier

.decl ClassType(class:Class)
.input ClassType

.decl InterfaceType(interface:Class)
.input InterfaceType

.decl DirectSuperclass(child:Class, parent:Class)
.input DirectSuperclass

.decl DirectSuperinterface(child:Class, parent:Class)
.input DirectSuperinterface

.decl SubClass(subclass:Class, class:Class)
.output SubClass


SubClass(subclass, class) :- DirectSuperclass(subclass, class).
SubClass(subclass, class) :- DirectSuperinterface(subclass, class).
SubClass(subclass, class) :-
    (
        DirectSuperclass(subclass, tmp);
        DirectSuperinterface(subclass, tmp)
    ),
    SubClass(tmp, class).

執行並且檢視輸出結果

ubuntu@ubuntu:~$ souffle -F out/ -D . example.dl  ; cat SubClass.csv |head -n 10
java.lang.UNIXProcess$Platform  java.lang.Enum
java.lang.UNIXProcess$Platform  java.lang.Object
java.lang.UNIXProcess$Platform  java.io.Serializable
java.lang.UNIXProcess$Platform  java.lang.Comparable
java.lang.Enum  java.lang.Object
java.lang.Enum  java.io.Serializable
java.lang.Enum  java.lang.Comparable
sun.text.normalizer.UBiDiProps$IsAcceptable     java.lang.Object
sun.text.normalizer.UBiDiProps$IsAcceptable     sun.text.normalizer.ICUBinary$Authenticate
java.net.ProxySelector  java.lang.Object

2 實現method call graph

對於static call和special call都是在編譯時就確定呼叫者的具體型別的,而virtual call需要在實際執行時根據obj的實際型別判斷函式呼叫。由此一來如何確定obj的執行時型別,成為了呼叫圖構造的關鍵。

對於cha演算法而言

receiver在實際執行的過程中的型別可以是其宣告型別的任意非abstract子類。所以我們需要一個Dispatch來進行method dispatch。

這裡直接貼ByteCodeDL的文件 http://github.com/BytecodeDL/ByteCodeDL/blob/main/docs/utils.md#method-dispatch

Dispatch(simplename, descriptor, class, method) :-
    MethodInfo(method, simplename, _, class, _, descriptor, _),
    !MethodModifier("abstract", method).

Dispatch(simplename, descriptor, class, method) :-
    !MethodInfo(_, simplename, _, class, _, descriptor, _),
    DirectSuperclass(class, superclass),
    Dispatch(simplename, descriptor, superclass, method),
    !MethodModifier("abstract", method).

第一個Dispatch表示如果class中有簽名相對並且修飾符不為abstract的method則返回method,第二個Dispatch表示如果沒從當前class中找到method則去從該class的superclass中尋找對應簽名並且不是abstract的method。

有了dispatch之後就可以實現cha呼叫圖了,程式碼還是直接看http://github.com/BytecodeDL/ByteCodeDL/blob/main/logic/cha.dl

還有一個rta演算法,不在這裡寫了, 直接看文件 ,ByteCodeDL也實現了。

3 cha的實際使用

針對不同的需求,我們需要找特定類,那麼這個時候cha呼叫圖就比較有用了。

官方文件以ezchain hfctf2022為例,講解了cha的實際使用。該ctf給了一個getter,禁用已知鏈,讓自己找getter來rce。

那麼有了如下程式碼

用SinkDesc宣告我們要的sink

#define MAXSTEP 5
#define CHAO 2

#include "../logic/cha.dl"


.decl NonParamPublicMethod(method:Method, class:Class)
.output NonParamPublicMethod

SinkDesc("exec", "java.lang.Runtime").
SinkDesc("<init>", "java.lang.ProcessBuilder").
SinkDesc("start", "java.lang.ProcessImpl").
SinkDesc("loadClass", "java.lang.ClassLoader").
SinkDesc("defineClass", "java.lang.ClassLoader").
SinkDesc("readObject", "java.io.ObjectInputStream").
SinkDesc("readExternal", "java.io.ObjectInputStream").


EntryMethod(method),
Reachable(method, 0),
NonParamPublicMethod(method, class) :- 
    MethodInfo(method, simplename, _, class, _, _, arity),
    MethodModifier("public", method),
    contains("get", simplename),
    arity = 0.

.output SinkMethod

找到entry為 <java.security.SignedObject: java.lang.Object getObject()> 的method

可以將結果匯入到neo4j中進行視覺化。

bash importOutput2Neo4j.sh neoImportCall.sh dbname

不演示了

這裡提一嘴,相對tabby來講,ByteCodeDL使用souffle減少了輸入源,更快。用了指標分析,更準。

缺點也很明顯,語法更變態,自定義規則頭髮直接掉完,文件少、規則庫不夠完善,門檻比tabby高太多。

4 pta/ptaint Analysis

這裡演算法我講不明白了,直接看ByteCodeDL文件把。

指標分析的一個簡單例子

#include "inputDeclaration.dl"
#include "utils.dl"
#include "pt-noctx.dl"

// 例項化 component
.init cipt = ContextInsensitivePt

// 初始化readchable
cipt.Reachable(method) :-
    MethodInfo(method, simplename, _, _, _, descriptor, _),
    simplename = "main",
    descriptor = "([Ljava/lang/String;)V".

.output cipt.VarPointsTo

這樣可以查詢出method name為main並且函式簽名是 ([Ljava/lang/String;)V 的函式,可以傳播到的點結果集。

截出一部分結果集來看

<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: void <init>()>/@this
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: void <init>()>/this#_0
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/demo#_8
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: void test1(java.lang.String)>/@this
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: void test1(java.lang.String)>/this#_0
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: void Sink(java.lang.String)>/@this
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: void Sink(java.lang.String)>/this#_0
<com.bytecodedl.benchmark.demo.TaintDemo1: void Sink(java.lang.String)>/new java.lang.StringBuilder/0   <com.bytecodedl.benchmark.demo.TaintDemo1: void Sink(java.lang.String)>/builder#_19
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: java.lang.String Source()>/@this
<com.bytecodedl.benchmark.demo.TaintDemo1: void main(java.lang.String[])>/new com.bytecodedl.benchmark.demo.TaintDemo1/0        <com.bytecodedl.benchmark.demo.TaintDemo1: java.lang.String Source()>/this#_0

對應的程式碼為

package com.bytecodedl.benchmark.demo;

public class TaintDemo1 {
    public static void main(String[] args) {
        TaintDemo1 demo = new TaintDemo1();
        String name = demo.Source();
        demo.test1(name);
    }

    public void test1(String name){
        String sql = "select * from user where name='" + name + "'";
        Sink(sql);
    }

    public void Sink(String param){
        StringBuilder builder = new StringBuilder();
        builder.append(param);
    }

    public String Source(){
        return "tainted name";
    }
}

沒啥問題。