Examples
Load Dataset
load data as graphs in
pytorch_geometric
format:from flowbench.dataset import FlowDataset dataset = FlowDataset(root="./", name="montage") data = dataset[0]
The
data
contains the structural information by accessingdata.edge_index
, and node feature informationdata.x
.load data as tabular data in
pytorch
format:from flowbench.dataset import FlowDataset dataset = FlowDataset(root="./", name="montage") data = dataset[0] Xs = data.x ys = data.y
Unlike the graph
pyg.data
, thedata
only contains the node features.load data as tabular data in
numpy
format:from flowbench.dataset import FlowDataset dataset = FlowDataset(root="./", name="montage") data = dataset[0] Xs = data.x.numpy() ys = data.y.numpy()
This is the same as the previous one, but the data is in
numpy
format, which is typically used in the models fromsklearn
andxgboost
.load text data with
huggingface
interface. We have uploaded our parsed text data in thehuggingface
dataset. You can load the data with the following code:from datasets import load_dataset dataset = load_dataset("cshjin/poseidon", "1000genome")
The dataset is in the format of
dict
with keystrain
,test
, andvalidation
.
PyOD Models
Type |
Abbr |
Algorithm |
Year |
Class |
---|---|---|---|---|
Probabilistic |
ABOD |
Angle-Based Outlier Detection |
2008 |
|
Probabilistic |
KDE |
Outlier Detection with Kernel Density Functions |
2007 |
|
Probabilistic |
GMM |
Probabilistic Mixture Modeling for Outlier Analysis |
|
|
Linear Model |
PCA |
Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) |
2003 |
|
Linear Model |
OCSVM |
One-Class Support Vector Machines |
2001 |
|
Linear Model |
LMDD |
Deviation-based Outlier Detection (LMDD) |
1996 |
|
Proximity-Based |
LOF |
Local Outlier Factor |
2000 |
|
Proximity-Based |
CBLOF |
Clustering-Based Local Outlier Factor |
2003 |
|
Proximity-Based |
kNN |
k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score) |
2000 |
|
Outlier Ensembles |
IForest |
Isolation Forest |
2008 |
|
Outlier Ensembles |
INNE |
Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles |
2018 |
|
Outlier Ensembles |
LSCP |
LSCP: Locally Selective Combination of Parallel Outlier Ensembles |
2019 |
|
Example of using GMM
from flowbench.pyod import GMM from flowbench.dataset import FlowDataset dataset = FlowDataset(root="./", name="1000genome") Xs = ds.x.numpy() clf = GMM() clf.fit(Xs) y_pred = clf.predict(Xs)
Detailed example in
example/demo_pyod.py
PyGOD Models
Type |
Abbr |
Year |
Class |
---|---|---|---|
Clustering |
SCAN |
2007 |
|
GNN+AE |
GAE |
2016 |
|
MF |
Radar |
2017 |
|
MF |
ANOMALOUS |
2018 |
|
MF |
ONE |
2019 |
|
GNN+AE |
DOMINANT |
2019 |
|
MLP+AE |
DONE |
2020 |
|
MLP+AE |
AdONE |
2020 |
|
GNN+AE |
AnomalyDAE |
2020 |
|
GAN |
GAAN |
2020 |
|
GNN+AE |
DMGD |
2020 |
|
GNN |
OCGNN |
2021 |
|
GNN+AE+SSL |
CoLA |
2021 |
|
GNN+AE |
GUIDE |
2021 |
|
GNN+AE+SSL |
CONAD |
2022 |
|
GNN+AE |
GADNR |
2024 |
|
Example of using GMM
from flowbench.unsupervised.pygod import GAE from flowbench.dataset import FlowDataset dataset = FlowDataset(root="./", name="1000genome") data = dataset[0] clf = GAE() clf.fit(data)
Detailed example in
example/demo_pygod.py
Supervised Models
Example of using MLP
from flowbench.supervised.mlp import MLPClassifier from flowbench.dataset import FlowDataset dataset = FlowDataset(root="./", name="1000genome") data = dataset[0] clf = MLPClassifier() clf.fit(data)
Detailed example in
example/demo_supervised.py
Supervised fine-tuned LLMs
Example of using LoRA (Low-rank Adaptation) for supervised fine-tuned LLMs:
from peft import LoraConfig dataset = load_dataset("cshjin/poseidon", "1000genome") # data processing ... # LoRA config peft_config = LoraConfig(task_type=TaskType.SEQ_CLS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1) training_args = TrainingArgument(...) # LoRA trainer trainer = Trainer(peft_model, ...) trainer.train() ...
Detailed example in
example/demo_sft_lora.py