Skip to main content

Table 7 This table shows the file size reduction of each domain after each step of the pre-processing algorithm

From: Comprehensive structured knowledge base system construction with natural language presentation

Domain

Original size

Size after excel conversion

Size after null and invalid values removed

File size after split (KB)

Agent

13 GB

5 GB (60%)

260 MB (98%)

800

Animal

250 MB

61 MB (75%)

23 MB (91%)

820

Chemical

15 MB

3.1 MB (79%)

1.7 MB (89%)

705

Event

134 MB

44 MB (68%)

8.1 MB (94%)

718

University

32 MB

7.5 MB (77%)

3.4 MB (89%)

690

Game

1 MB

188 KB (82%)

132 KB (87%)

660

Album

327 MB

116 MB (65%)

13 MB (96%)

650

Organization

1.32 GB

564 MB (57%)

46 MB (96%)

680

Place

4.43 GB

1.8 GB (60%)

125 MB (97%)

720

Work

1.7 GB

663 MB (61%)

27 MB (98%)

710

  

68.4%

93.5%

716

  1. The second column shows the original size of the file
  2. Columns 3, 4 and 5 show the file size reduction after the program converts the file to Excel, removes the null and invalid values, and splits the file. The percentage of file reduction is given in parentheses. The bottom line indicates the average percentage of the file reduction of each step