# Data Cleaning Endpoints The Data Cleaning Suite provides a set of endpoints to: - Authenticate - Create a Job - Upload a File - Update Mappings & Enrichments - Retrieve the Enriched File Below is a flow diagram that outlines how to use these endpoints effectively. ## Flow Diagram ```mermaid graph TD A(Authenticate) --> B(Create Job) B --> C(Upload File) C --> D(Update Mappings) D --> E(Submit Job) E --> F(Check Job Status) F --> G(Update Enrichments) G --> H(Start Enrichment) H --> I(Retrieve Enriched File) ``` ## 1. Authenticate Before using any of the endpoints, you must authenticate. This ensures you have the necessary permissions to access the data. ### Example Request ```http POST /authenticate ``` ## 2. Create A Job This endpoint creates a data cleaning job, which acts as a container for the file and subsequent actions. Each job is uniquely identified by an `id`. ### Example Request ```http POST /dataCleaning/jobs ``` #### Example requestBody ```json { "name": "Data Cleaning Job 03-10-20xx" } ``` #### Example Response ```json { "id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd", "name": "Testing From Technical Author", "createdAt": "2025-02-07T14:07:10.8766667", "modifiedAt": "2025-02-07T14:07:10.8766667", "managingUserId": 123456789, "managingCustomerId": 987654321, "owningCustomerId": 987654321, "owningUserId": 123456789, "status": "created", "source": "dataCleaning", "archived": false, } ``` ## 3. Upload A Job File This endpoint uploads the file to be processed. The `id` from the job creation step must be passed in the path to associate the file with the job. The file must be sent as `form-data`, and you must specify whether the file includes a header using the `hasHeader` property. ### Example Request ```http POST /dataCleaning/jobs/{id}/upload ``` ### Example Response ```json { "correlationId": "2a7b5537-3950-4903-810d-9814c91d5564", "id": "9e824d9e-0e77-43ef-1f10-08dd712f5830", "sourceFilename": "test-file-input.csv", "hasHeader": true, "createdAt": "2025-04-03T12:24:32.2266667", "modifiedAt": "2025-04-03T12:24:32.2266667", "managingUserId": 123456789, "managingCustomerId": 987654321, "status": "uploaded", "active": true } ``` ## 4. Update Mappings This endpoint maps the columns of the uploaded file to the required fields for matching. Use the available ENUMs (column headers) to match your file's columns to the Creditsafe database. > **NOTE** The first column starts at position '0'. ### Example Request ```http PUT /dataCleaning/jobs/{id}/mappings ``` #### Example requestBody ```json [ { "mapping": "companyId", "value": "0" }, { "mapping": "orgNumber", "value": "1" }, { "mapping": "name", "value": "2" } ] ``` > **NOTE** Ensure your column headers match the available ENUMs as closely as possible. This forms the basis of the matching process. ## 5. Submit Job This endpoint submits the file for matching against the Creditsafe database. The job `id` must be passed in the path, and an empty request body is required. ### Example Request ```http POST /dataCleaning/jobs/{id}/submit ``` ## 6. Return Job By Id Number This endpoint retrieves the current status of the job. It can be used periodically to track progress. The job must reach the `jobMatchingComplete` status before proceeding to enrichment. ### Example Request ```http GET /dataCleaning/jobs/{id} ``` #### Example Response ```json { "id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd", "name": "Testing From Technical Author", "createdAt": "2019-08-24T14:15:22Z", "modifiedAT": "2019-08-24T14:15:22Z", "managingUserId": 123456789, "managingCustomerId": 987654321, "owningCustomerId": 987654321, "owningUserId": 123456789, "status": "jobMatchingComplete", "source": "dataCleaning", "jobSummary": { "totalRows": 0, "matched": 20, "manualMatched": 0, "unmatched": 0, "duplicates": 0 }, "archived": true } ``` ## 7. Update Enrichments This endpoint applies the desired enrichment type to the matched data. Enrichment types include: - basic - basicPlus - standard ### Example Request ```http PUT /dataCleaning/jobs/{id}/enrichments ``` #### Example requestBody It is possible to remove properties not required for enrichment credit type. It is not possible to add additional tags beyond the maximum allowable tags for that credit type ```json { "enrichments": [ { "enrichment": "general.safeNumber" }, { "enrichment": "general.connectId" }, { "enrichment": "general.ggsId" }, { "enrichment": "general.companyName" }, ] } ``` > **NOTE** Refer to the API documentation for the full list of allowable enrichments for each type. ## 8. Start Enrichment This endpoint submits the request to enrich the matched data. The job `id` must be passed in the path, and an empty request body is required. ### Example Request ```http POST /dataCleaning/jobs/{id}/enrich ``` ## 9. Return Job By Id Number This endpoint is used to check the status of the submission request, this endpoint may be used multiple times for periodic checks. It is important to note that the endpoint after this point (Return Enriched File) can not be carried out without the 'Matching' process to reach a status of `enrichmentComplete`. The data cleaning job `id` needs to be passed into the path. ### Example Request ```http GET /dataCleaning/jobs/{id} ``` #### Example Response ```json { "id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd", "name": "Testing From Technical Author", "createdAt": "2019-08-24T14:15:22Z", "modifiedAT": "2019-08-24T14:15:22Z", "managingUserId": 123456789, "managingCustomerId": 987654321, "owningCustomerId": 987654321, "owningUserId": 123456789, "status": "enrichmentComplete", "countryCode": "GB", "portfolioId": "string", "source": "dataCleaning", "jobSummary": { "totalRows": 0, "matched": 20, "manualMatched": 0, "unmatched": 0, "duplicates": 0 }, "jobEnrichmentSettings": { "creditType": "basic" }, "archived": true } ``` ## 10. Return Enriched Job File This endpoint retrieves the completed, enriched file. The job `id` must be passed in the path. By default, the response is a `.csv `file, but if the file contains fewer than 300,000 rows, it can also be returned as `.xlsx`. ### Example Request ```http GET /dataCleaning/jobs/{id}/enrichedFile ``` #### Example Response ```json { "correlationId": "string", "filePath": "string" } ```