Cyber Data
Aggregate-level Cyber Attack Data
-
Hackmageddon Cyber attack timelines by month. Burrow down. The 2016 summaries link to google sheets where you can get the raw data. The older timelines seem to only show png’s.
-
The Center for Strategic Studies This timeline records significant cyber incidents since 2006. Focus on cyber attacks on government agencies, defense and high tech companies, or economic crimes with losses of more than a million dollars.
Domain Names
- Cisco Umbrealla - Top 1 Million : A free list of the top 1 million most popular domains
- Majestic Million - Site : Free search and download of the top million websites
- Majestic Million - Data : Majestic Million Daily snapshot
- DomainIQ : Great site for domain intel
Low-Level Data (PCAP, Zeek, etc)
-
AndersonPaschoalon PCAPs : Great set of PCAPs
-
Military Academy CDX Data @ Flyn Computing : Great set of pcaps and supporting exercise documents from the 2017 US Military Academies CDX. Currated (and heavily researched by) by Mike Petullo of Flyn Computing.
- Canadadian Institute of Cyebrsecurity Datasets
-
Collegiate Penetration Testing Competition Exercise Data : Data from the Collegiate Penetration Testing Competition. Splunk tream data, logs, winevents, more. No PCAP here
-
Cyber Security Data Mining Competition (CDMC) : Competition for Cyber Data Mining. Data not publically available. Interesting to see the prompts for the competition.
-
Global Collegiate Penetration Testing Competition : Global Collegiate Penetration Testing Competition data
-
DAPT2020 Dataset for Advanced Persistent Threats : This dataset, DAPT 2020, was created with network traffic collected over 5 days, where each day can be considered analogous to 3 months in a real-world scenario..
-
KDD CUP 99 : A baseline dataset for IDS. See: Stolfo, J., et al. “Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection.” Results from the JAM Project by Salvatore (2000).
-
ICS-Security-Tools PCAPs : Nice collection of ICS PCAPs – MELSEC, Zigbee, CIP and many more.
-
Kitsune PCAPs : PCAPs for the Kitsune ANN paper
-
Malware Analysis PCAPs : Good repo for PCAPs with challenges
-
NETRESEC Data : Lots of CDX and CTF data!
-
NSL-KDD : Augments issues with earlier version KDD CUP 99 (see above). Also available from here
-
PandaCAP : PandaCap PCAP, a dataset of 63 PANDA traces from honeynet, collected using the PANDAcap framework
-
Predict : Protected Repository for the Defense of Infrastructure Against Cyber Threats.
-
SecRepo : Big list of links
-
Security Datasets : open-source initiatve that contributes malicious and benign datasets, from different platforms, to the infosec community to expedite data analysis and threat research
-
SimpleWeb : PCAP traces for various network events/attacks
-
StratosphereLabs : Fantastic site of various captures and analysis
-
TMInfosec Datasets : Currated list of pcap @TMInfosec
-
DoH – Real-World : A very large ISP network
- CESNET-TLS22 : CESNET-TLS22: A large dataset for fine-grained classification of TLS services
Mocking Data
- LoremFlickr : Generates pictures from Creative Commons licened photos
- RoboHash : Generate unqiue robot images a visual hashes
Threat Data / Intel
Indicators of Compromise (IOCs)
- APT Digital Weapon : Large and updated IOC set from Qi-AnXin
- APT & Cybercriminal Campaign Collection : Great source of APT & Cybercriminals Campaign Collection.
- EST IOCs : Well-organized and broad list of IOCs from various investigations donated by WeLiveSecurity
- UNC2452 : An active campaign
- 401trg
- openphish : Phishing IOCs
Threat Feeds
Malware
- Malpedia : Excellent aggregation of malware and the trail it leaves
- InQuestLabs : The InQuest platform provides high-throughput Deep File Inspection (DFI) for threat and data leakage prevention, detection, and hunting.
- Enron Emails : Massive trove of emails from early 2000s
General
- AWS Open Data : Large scale AWS-ready datasets
- Stack Overflow Archive : Historical stackoverflow data (posts, replies, etc)
GIS / Location
- Equator Studios LiDAR : Great source for LiDAR
- GeoNames : GeoNames contains over 25 million geographical names and consists of over 12 million unique features whereof 4.8 million populated places and 16 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. Free / CC/
Images
- 80 Million Tiny Images : 80 Million 32 x 32 labeled images
Media
- GDELT project : The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Geotagged social media, news, and other NLP
Data Search Engines / Aggregators
-
Caesar0301 Awesome List of Public Datasets : Pretty big list; Categorized
-
Impact Cyber Trust : Mass repo of Cyber data