# Isle of Man Freedom of Information Open Data > Structured datasets extracted from Freedom of Information disclosures > published by Isle of Man Government authorities, 2017-2026. ## What This Is This collection contains machine-readable data extracted from 4,139 Freedom of Information (FOI) requests made to 23 Isle of Man Government public authorities. The original responses were published as PDF documents on the Isle of Man Government's FOI disclosure log. This project downloaded those PDFs, extracted text via OCR, used large-language-model distillation to produce structured summaries, and parsed embedded tables into JSON and CSV. The result is 1,674 cases with extractable data tables (11,485 tables, 174,229 data rows) plus 4,139 case summaries covering all processed requests. ## How It Is Structured The data is organised in two layers: 1. **Case summaries** (`cases.jsonl`): One JSON object per line for all 4,139 FOI cases. Contains metadata, an LLM-generated summary, key facts, topic tags, and outcome classifications. No table data. 2. **Themed datasets**: Nine files, each containing FOI cases whose extracted tables relate to a specific theme. Available in both JSON (with nested table data) and CSV (flattened rows) formats. ## File Inventory ### cases.jsonl (5.1 MB, 4,139 lines) The most important file for LLM use. One JSON object per line. Every FOI case processed, with structured summaries but no table data. ### Themed Datasets (JSON + CSV pairs) | File | Cases | Tables | Rows | Description | |-------------------|------:|-------:|--------:|----------------------------------------------------------| | spending.json/csv | 414 | 2,515 | 45,226 | Government budgets, expenditure, procurement, finances | | healthcare.json/csv| 310 | 2,955 | 54,311 | Patient data, waiting times, hospital stats, prescriptions| | education.json/csv | 233 | 2,440 | 29,771 | Schools, pupil numbers, exam results, attainment | | transport.json/csv | 234 | 1,134 | 17,420 | Roads, traffic, vehicles, public transport, airport | | environment.json/csv| 106 | 1,033 | 11,275 | Energy, waste, water, emissions, wildlife, agriculture | | employment.json/csv | 125 | 465 | 5,221 | Staff numbers, sick leave, salaries, workforce data | | crime.json/csv | 91 | 226 | 2,454 | Criminal offences, arrests, prosecutions, policing | | planning.json/csv | 90 | 315 | 3,260 | Planning applications, housing, property, development | | other.json/csv | 71 | 402 | 5,291 | Cases not matching any specific theme | ### all.json (41 MB) All themed datasets combined into a single JSON file. Same structure as the individual themed files. ### manifest.json Machine-readable metadata with dataset statistics. ## Schema of cases.jsonl Each line is a JSON object with these fields: ``` { "case_id": integer — Unique numeric ID for the FOI case "title": string — Subject line of the FOI request "authority": string — Public authority (e.g. "Department of Health and Social Care") "date_received": string — ISO 8601 date (e.g. "2023-06-15") "outcome": string — Result classification (see values below) "summary": string — Plain-English summary of the request and response "key_facts": string[] — Notable factual statements from the response "data_mentioned": string[] — Specific figures, dates, and data points cited "exemptions_cited": string[] — Legal exemptions used to withhold information "topics": string[] — Topic tags (e.g. ["healthcare", "waiting times"]) "notable": boolean — Whether the case was flagged as particularly noteworthy } ``` ### Outcome Values - "All information sent" - "Some information sent but part exempt" - "Some information sent but not all held" - "Information not held" - "No information sent - all held but exempt" - "Request lapsed - requested information not provided" - "Withdrawn" - "Vexatious request" - "Repeated request" - "Not required to fulfill request" - "Neither confirm or deny information held" - "Decision Notice - Complaint not upheld" - "Decision Notice - Complaint part upheld" - "Decision Notice - Complaint upheld" - "Upheld - full" - "Upheld - partial" - "Not upheld" - "Information Notice" ### Authorities (23 total) Department of Infrastructure (741 cases), Department of Health and Social Care (511), Cabinet Office (489), Department of Education, Sport and Culture (448), Manx Care (341), Department of Environment, Food and Agriculture (331), Department for Enterprise (276), Department of Home Affairs (186), Treasury (183), Isle of Man Constabulary (139), Manx Utilities Authority (128), Attorney General's Chambers (92), Isle of Man Post Office (60), Office of the Clerk of Tynwald (54), General Registry (29), Manx National Heritage (27), Isle of Man Financial Services Authority (24), Communications Commission (23), Gambling Supervision Commission (14), Road Transport Licensing Committee (13), Public Sector Pensions Authority (13), Financial Intelligence Unit (9), Office of Fair Trading (8). ## Schema of Themed JSON Files Each themed file (e.g. `crime.json`) is a JSON object: ``` { "dataset": string — Theme name "description": string — Theme description "source": string — "Isle of Man Freedom of Information Disclosure Log" "record_count": integer — Number of FOI cases "table_count": integer — Number of data tables "row_count": integer — Total data rows "generated": string — Generation date "cases": [ { "case_id": integer "title": string "authority": string "date_received": string "outcome": string "summary": string "key_facts": string[] "data_mentioned": string[] "exemptions_cited": string[] "tables": [ { "headers": string[] — Column names for this table "rows": string[][] — 2D array of cell values "row_count": integer "col_count": integer } ] } ] } ``` ## Schema of CSV Files Each CSV file has a header row: `case_id, authority, date_received, title, col_1, col_2, ... col_46` Within the CSV, rows where the `title` column starts with `[HEADER]` contain the actual column names for the table that follows (since each table has its own schema). Filter by `case_id` to isolate a single FOI response. Data columns (`col_1` through `col_46`) have varying meanings depending on the table; consult the preceding `[HEADER]` row. ## How to Query This Data ### For case-level questions (summaries, topics, outcomes): Use `cases.jsonl`. Load it line by line, parse each JSON object, and filter or search by any field. ### For tabular data (actual numbers, statistics, figures): Use the themed JSON files. Each case's `tables` array contains the structured data with headers and rows. ### For bulk data analysis: Use the CSV files. They can be loaded directly into pandas, R, or any spreadsheet tool. Remember to handle the `[HEADER]` rows. ### Useful filters: - By authority: `authority == "Department of Health and Social Care"` - By date range: `date_received >= "2023-01-01"` - By topic: `"waiting times" in topics` - By outcome: `outcome == "All information sent"` ## Example Questions an LLM Could Answer Using cases.jsonl (summaries): - "What FOI requests were made about hospital waiting times?" - "How many requests did the Department of Infrastructure receive in 2024?" - "Which authorities most frequently cite exemptions to withhold data?" - "What are the most common topics in FOI requests?" - "Summarise all FOI cases related to cannabis policy on the Isle of Man." - "Were any FOI requests about government spending on consultants?" Using themed datasets (tables): - "What were the crime statistics for Douglas in 2022?" - "How has school enrollment changed over the past five years?" - "What is the government's annual expenditure on road maintenance?" - "How many patients were on hospital waiting lists in 2023?" - "What are the recycling rates on the Isle of Man?" - "How many police officers are employed by the Isle of Man Constabulary?" - "What was the average class size in primary schools?" Cross-dataset analysis: - "Compare healthcare spending trends with waiting time data." - "Which government departments have the highest staff sick-leave rates?" - "Is there a correlation between planning applications and housing data?" ## Temporal Coverage Requests span from 3 January 2017 to 5 March 2026. The underlying data in tables may reference earlier periods depending on the FOI question asked. ## Spatial Coverage Isle of Man — a self-governing British Crown Dependency in the Irish Sea. ## License Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/