# Datasets

We pre compressed a few datasets using k2

* DBLP 2017
* ArchivesHub
* Jamendo
* Scholarlydata
* DBpedia (en)  in sections (version 03.2021)
* Full DBpedia (en) (version 03.2021)
* Full DBpedia for all other languages available (version 10.2016)
* Full DBpedia for all other languages available with NIF & raw_table datasets (version 10.2016)

The DBpedia en files were accessed at 9th April 2021 and dervied from https://databus.dbpedia.org/dbpedia/collections/latest-core

The DBpedia files for all other languages available were derived from http://downloads.dbpedia.org/2016-10/core-i18n/ . 
All TTL and NTriple files were added to one RDF file and compressed.
If you want them with the NIF and raw_table datasets use the dbpedia-10_2016-nif files.

# How to use 

Every compressed dataset consists of two files

* DATASET.k2
* DATASET.k2.dict

The first one represents the actual graph (the triples). 
The second one is an HDT Dictionary. 

Download both to the same folder and use rdf2k2 to decompress them again.

## decompression

Either use our Java implementation  at https://github.com/dice-group/GraphCompression/releases/download/v1.1.0-k2/rdf2k2-1.1.0-k2.jar 
or our C++ version which is much faster and less memory intensive at https://github.com/dice-group/rdf2k2-cpp

let's say the dataset is scholarly =  scholarly.k2 and scholarly.k2.dict

```
./rdf2k2 -d -tkd2 scholarly.k2 scholarly.nt 
```

This will use scholarly.k2 and scholarly.k2.dict to decompress and will write the results in an NTriple file called scholarly.nt