Examples

Load Dataset

load data as graphs in pytorch_geometric format:
```
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="montage")
data = dataset[0]
```
The data contains the structural information by accessing data.edge_index, and node feature information data.x.

load data as tabular data in pytorch format:

from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="montage")
data = dataset[0]
Xs = data.x
ys = data.y

Unlike the graph pyg.data, the data only contains the node features.

load data as tabular data in numpy format:
```
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="montage")
data = dataset[0]
Xs = data.x.numpy()
ys = data.y.numpy()
```
This is the same as the previous one, but the data is in numpy format, which is typically used in the models from sklearn and xgboost.
load text data with huggingface interface. We have uploaded our parsed text data in the huggingface dataset. You can load the data with the following code:
```
from datasets import load_dataset
dataset = load_dataset("cshjin/poseidon", "1000genome")
```
The dataset is in the format of dict with keys train, test, and validation.

PyOD Models

Type	Abbr	Algorithm	Year	Class
Probabilistic	ABOD	Angle-Based Outlier Detection	2008	`flowbench.unsupervised.pyod.ABOD`
Probabilistic	KDE	Outlier Detection with Kernel Density Functions	2007	`flowbench.unsupervised.pyod.KDE`
Probabilistic	GMM	Probabilistic Mixture Modeling for Outlier Analysis		`flowbench.unsupervised.pyod.GMM`
Linear Model	PCA	Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes)	2003	`flowbench.unsupervised.pyod.PCA`
Linear Model	OCSVM	One-Class Support Vector Machines	2001	`flowbench.unsupervised.pyod.OCSVM`
Linear Model	LMDD	Deviation-based Outlier Detection (LMDD)	1996	`flowbench.unsupervised.pyod.LMDD`
Proximity-Based	LOF	Local Outlier Factor	2000	`flowbench.unsupervised.pyod.LOF`
Proximity-Based	CBLOF	Clustering-Based Local Outlier Factor	2003	`flowbench.unsupervised.pyod.CBLOF`
Proximity-Based	kNN	k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score)	2000	`flowbench.unsupervised.pyod.KNN`
Outlier Ensembles	IForest	Isolation Forest	2008	`flowbench.unsupervised.pyod.IForest`
Outlier Ensembles	INNE	Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles	2018	`flowbench.unsupervised.pyod.INNE`
Outlier Ensembles	LSCP	LSCP: Locally Selective Combination of Parallel Outlier Ensembles	2019	`flowbench.unsupervised.pyod.LSCP`

Example of using GMM

from flowbench.pyod import GMM
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="1000genome")
Xs = ds.x.numpy()
clf = GMM()
clf.fit(Xs)
y_pred = clf.predict(Xs)

Detailed example in example/demo_pyod.py

PyGOD Models

Type	Abbr	Year	Class
Clustering	SCAN	2007	`flowbench.unsupervised.pygod.SCAN`
GNN+AE	GAE	2016	`flowbench.unsupervised.pygod.GAE`
MF	Radar	2017	`flowbench.unsupervised.pygod.Radar`
MF	ANOMALOUS	2018	`flowbench.unsupervised.pygod.ANOMALOUS`
MF	ONE	2019	`flowbench.unsupervised.pygod.ONE`
GNN+AE	DOMINANT	2019	`flowbench.unsupervised.pygod.DOMINANT`
MLP+AE	DONE	2020	`flowbench.unsupervised.pygod.DONE`
MLP+AE	AdONE	2020	`flowbench.unsupervised.pygod.AdONE`
GNN+AE	AnomalyDAE	2020	`flowbench.unsupervised.pygod.AnomalyDAE`
GAN	GAAN	2020	`flowbench.unsupervised.pygod.GAAN`
GNN+AE	DMGD	2020	`flowbench.unsupervised.pygod.DMGD`
GNN	OCGNN	2021	`flowbench.unsupervised.pygod.OCGNN`
GNN+AE+SSL	CoLA	2021	`flowbench.unsupervised.pygod.CoLA`
GNN+AE	GUIDE	2021	`flowbench.unsupervised.pygod.GUIDE`
GNN+AE+SSL	CONAD	2022	`flowbench.unsupervised.pygod.CONAD`
GNN+AE	GADNR	2024	`flowbench.unsupervised.pygod.GADNR`

Example of using GMM

from flowbench.unsupervised.pygod import GAE
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="1000genome")
data = dataset[0]
clf = GAE()
clf.fit(data)

Detailed example in example/demo_pygod.py

Supervised Models

Example of using MLP

from flowbench.supervised.mlp import MLPClassifier
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="1000genome")
data = dataset[0]
clf = MLPClassifier()
clf.fit(data)

Detailed example in example/demo_supervised.py

Supervised fine-tuned LLMs

Example of using LoRA (Low-rank Adaptation) for supervised fine-tuned LLMs:

from peft import LoraConfig
dataset = load_dataset("cshjin/poseidon", "1000genome")
# data processing
...
# LoRA config
peft_config = LoraConfig(task_type=TaskType.SEQ_CLS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
training_args = TrainingArgument(...)
# LoRA trainer
trainer = Trainer(peft_model, ...)
trainer.train()
...

Detailed example in example/demo_sft_lora.py