AI vs. AI: Can AI Detect AI-Generated Images?
:1. Introduction
2. Related Work
2.1. Image-to-Image Synthesis
2.1.1. Image-to-Image Synthesis through Conditional GAN (cGAN)
2.1.2. Image-to-Image Synthesis through Transformers
2.2. Sketch-to-Image Synthesis
2.3. Text-to-Image Synthesis
2.3.1. Text-to-Image Synthesis though Attention Module
2.3.2. Text-to-Image Synthesis though Contrastive Learning
2.3.3. Text-to-Image Synthesis though Deep Fusion Block (DFBlock)
3. Data Collection and Methodology
3.1. Data Collection: Real or Synthetic Images (RSI)
3.2. Methodology
4. Results and Analysis
4.1. Evaluation Metrics
4.2. Experimental Results on RSI
4.3. Effectiveness of Our Model
4.4. Ablation Study
4.5. Experimental Results on Other Datasets
5. Discussion and Limitations
6. Conclusions and Future Work
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Task | Model | Input | Training Set | Validation Set | Testing Set | Total |
Image-to-image synthesis | OASIS [14] | Semantic mask map | 2000 | 1000 | 1000 | 4000 |
CC-FPSE [15] | 2000 | 1000 | 1000 | 4000 | ||
SPADE [16] | 2000 | 1000 | 1000 | 4000 | ||
Taming-transformers [17] | 2000 | 1000 | 1000 | 4000 | ||
Sketch-to-image synthesis | S2I-DetectoRS [18] | Sketch | 2000 | 1000 | 1000 | 4000 |
S2I-HTC [18] | 2000 | 1000 | 1000 | 4000 | ||
S2I-QueryInst [18] | 2000 | 1000 | 1000 | 4000 | ||
S2I-MaskRCNN [18] | 2000 | 1000 | 1000 | 4000 | ||
Text-to-image synthesis | AttnGAN [19] | Text | 2000 | 1000 | 1000 | 4000 |
DM-GAN+CL [20] | 2000 | 1000 | 1000 | 4000 | ||
DF-GAN [21] | 2000 | 1000 | 1000 | 4000 | ||
ControlGAN [22] | 2000 | 1000 | 1000 | 4000 | ||
24,000 | 12,000 | 12,000 | 48,000 |
Precision | Recall | F1 | Accuracy | AP | ROC-AUC | FPR | FNR | |
VGG19 | 0.94 | 0.94 | 0.94 | 0.94 | 0.9819 | 0.9803 | 0.053 | 0.064 |
ResNet50 | 0.93 | 0.91 | 0.91 | 0.91 | 0.9933 | 0.9927 | 0.0035 | 0.168 |
ResNet101 | 0.95 | 0.95 | 0.95 | 0.95 | 0.9879 | 0.9877 | 0.028 | 0.08 |
ResNet152 | 0.92 | 0.92 | 0.92 | 0.92 | 0.9743 | 0.9718 | 0.042 | 0.118 |
InceptionV3 | 0.98 | 0.98 | 0.98 | 0.98 | 0.9976 | 0.9974 | 0.016 | 0.03 |
Xception | 0.97 | 0.97 | 0.97 | 0.97 | 0.9995 | 0.9994 | 0.0003 | 0.054 |
DenseNet121 | 0.97 | 0.97 | 0.97 | 0.97 | 0.9969 | 0.9966 | 0.012 | 0.044 |
InceptionResNetV2 | 0.96 | 0.96 | 0.96 | 0.96 | 0.9942 | 0.9943 | 0.037 | 0.036 |
MixConv | 0.94 | 0.94 | 0.94 | 0.94 | 0.9411 | 0.9412 | 0.057 | 0.056 |
MaxViT | 0.92 | 0.86 | 0.89 | 0.89 | 0.9375 | 0.9375 | 0.087 | 0.137 |
EfficientNetB4 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 0.0 |
Precision | Recall | F1 | Accuracy | AP | ROC-AUC | FPR | FNR | |
S2I_T2I | 0.99 | 0.99 | 0.99 | 0.99 | 0.9998 | 0.9997 | 0.026 | 0.001 |
I2I_T2I | 0.87 | 0.83 | 0.82 | 0.83 | 0.9684 | 0.9801 | 0.347 | 0.0 |
I2I_S2I | 0.96 | 0.95 | 0.95 | 0.95 | 0.99997 | 0.99997 | 0.093 | 0.0 |
Precision | Recall | F1 | Accuracy | AP | ROC-AUC | FPR | FNR | |
Image-to-image (I2I) | 0.97 | 0.97 | 0.97 | 0.9690 | 0.9961 | 0.9957 | 0.032 | 0.03 |
Sketch-to-image (S2I) | 0.98 | 0.98 | 0.98 | 0.9790 | 0.9977 | 0.9974 | 0.012 | 0.03 |
Text-to-image (T2I) | 0.98 | 0.98 | 0.98 | 0.9835 | 0.9991 | 0.9990 | 0.003 | 0.03 |
Input Modality | Precision | Recall | F1 | Accuracy | AP | ROC-AUC | FPR | FNR | |
OASIS [14] | I2I | 0.96 | 0.96 | 0.96 | 0.964 | 0.9958 | 0.9951 | 0.042 | 0.03 |
CC-FPSE [15] | 0.98 | 0.98 | 0.98 | 0.983 | 0.9983 | 0.9981 | 0.004 | 0.03 | |
SPADE [16] | 0.98 | 0.98 | 0.98 | 0.978 | 0.9979 | 0.9977 | 0.014 | 0.03 | |
Taming-transformers [17] | 0.95 | 0.95 | 0.95 | 0.951 | 0.9928 | 0.9920 | 0.068 | 0.03 | |
S2I-DetectoRS [18] | S2I | 0.98 | 0.98 | 0.98 | 0.981 | 0.9977 | 0.9974 | 0.008 | 0.03 |
S2I-HTC [18] | 0.98 | 0.98 | 0.98 | 0.977 | 0.9977 | 0.9973 | 0.016 | 0.03 | |
S2I-QueryInst [18] | 0.98 | 0.98 | 0.98 | 0.978 | 0.9977 | 0.9975 | 0.014 | 0.03 | |
S2I-MaskRCNN [18] | 0.98 | 0.98 | 0.98 | 0.980 | 0.9978 | 0.9974 | 0.010 | 0.03 | |
AttnGAN [19] | T2I | 0.98 | 0.98 | 0.98 | 0.984 | 0.9990 | 0.9989 | 0.002 | 0.03 |
DM-GAN+CL [20] | 0.98 | 0.98 | 0.98 | 0.984 | 0.9996 | 0.9996 | 0.002 | 0.03 | |
DF-GAN [21] | 0.98 | 0.98 | 0.98 | 0.982 | 0.9986 | 0.9985 | 0.006 | 0.03 | |
ControlGAN [22] | 0.98 | 0.98 | 0.98 | 0.984 | 0.9991 | 0.9990 | 0.002 | 0.03 |
Input Modality | Used Dataset | Precision | Recall | F1 | Accuracy | AP | ROC-AUC | FPR | FNR | |
OASIS [14] | I2I | ADE20K [57] | 0.91 | 0.89 | 0.89 | 0.889 | 0.9839 | 0.9826 | 0.008 | 0.22 |
S2I-DetectoRS [18] | S2I | Sketchy [58] | 0.95 | 0.94 | 0.94 | 0.943 | 0.9998 | 0.9998 | 0.0 | 0.13 |
AttnGAN [19] | T2I | CUB-200-2011 [59] | 0.98 | 0.98 | 0.98 | 0.978 | 0.9991 | 0.9988 | 0.0 | 0.04 |
