1. Home
  2. Knowledge Base
  3. Articles
  4. Handling large datasets with Hubway Connect and Excel

Handling large datasets with Hubway Connect and Excel

This article continues on from a previous post,ย Hubway for data extraction into Excel.ย We recommend you read the previous article to have a better idea of the basics and how we got here.

Hubway Connect enforces an upper limit on the number of records being returned in the response payload of a single OData call. This default setting prevents the impact on runtime performance caused by large datasets. When it comes to large datasets, you need to paginate the API responses, which can be done in the source OData APIs.ย OData query optionsย can be used to pre-filter data and optimise the data retrieval process into Excel.

Applying pagination to large datasets

Letโ€™s assume that the OrderItem Entity has 20,000 records and we want to retrieve them in blocks of 1000 per page. Here is how to instrument Excel to process the dataset in smaller chunks by splitting them into pages.

In MS Excel, go to Data โ†’ Get Data โ†’ From Other Sources โ†’ Blank Query.

Applying pagination to large datasetsโ€‹
pagination to large datasets

Click Advanced Editor and paste the code below. Replace <Service link>/<Entity Name> with your Test or Production Service link and the Entity Name. Also replace ColumnA, ColumnB, etc. with the column names in your Entity.

let
    BaseUrl         = "<Service link>/<Entity Name>",
    RecordsPerPage = 1000,

    GetJson = (Url) =>
        let Options = [
        Headers = [ #"accept" = "application/json" ]
      ],
            RawData = Web.Contents(Url, Options),
            Json    = Json.Document(RawData)
        in  Json,

    GetTextData = (Url) =>
        let options = [ 
        Headers = [ 
            #"accept" = "application/json"] 
        ],
            RawData = Web.Contents(BaseUrl, Options),
            TextData = Text.FromBinary(RawData)
    in TextData,

    GetRecordCount = () => 
        let Url = BaseUrl & "/$count", 
            TextData= GetTextData(Url), 
            Count = Number.FromText(TextData) 
        in Count,

    GetPage = (Index) =>
        let Skip  = "?$skip=" & Text.From(Index * RecordsPerPage),
            Top   = "$top=" & Text.From(RecordsPerPage),
            Url   = BaseUrl & Skip,
            Json  = GetJson(Url),
            Value = Json[#"value"]
        in  Value,

    RecordCount = List.Max({ RecordsPerPage, GetRecordCount() }),
    PageCount   = Number.RoundUp(RecordCount / RecordsPerPage),
    PageIndices = { 0 .. PageCount - 1 },
    Pages       = List.Transform(PageIndices, each GetPage(_)),
    Records    = List.Union(Pages),
    Table       = Table.FromList(Records, Splitter.SplitByNothing(), null,
			null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(Table, "Column1",
{"columnA", "columnB", "columnC"})
in #"Expanded Column1"

Note the use of the 2 OData query options $skip and $top, which are used to add pagination to the API response, ensuring that the dataset is filtered at the source, saving bandwidth and reducing processing time.

When prompted for Credentials, select Basic Authentication and enter the OutSystems username and password for the service link.

Enter a name for your query, for example, โ€œLargeOrderItemโ€

Data loaded through MS Power Query
paginate records

This will paginate records in blocks of 1000, retrieve all the records in the Entity and add the query to your Excel model. Click Close and Load. Power Query Editor will close and load the query into the Excel Worksheet.

Add the newly created LargeOrderItem query to the model by going to Data โ†’ Queries & Connections. Right-click on LargeOrderItem โ†’ Load To โ†’ tick the Add this data to the Data Model and click ok. This will add LargeOrderItem to the Excel model and allow you to build relationships and analysis with the existing datasets in the model.

Add data to Excel Data Model
existing datasets in the model.

You can now add other related Entities to Excel as described in the article Hubway for data extraction into Excelย and build Pivot Charts and Reports.

Entity Relationship
Pivot Charts and Reports.
Simple Pivot Field selection
Pivot Charts and Reports.
Pivot Chart and Report
Pivot Charts and Reports.

Supercharging the synchronisation of large datasets with incremental loads

Let us assume that the OrderItem Entity has 20,000 records and keeps growing at a pace of 1,000 new records daily. We need a daily synchronisation process that will keep importing new data to keep the Excel report up to date. In other words, we only want to retrieve records that were created or updated since the last sync.

You can apply a filter parameter to the OData Hubway service in Excelโ€™s Advanced Query editor and retrieve a smaller and controlled set of records.

 

Using the $filter query option

In the example below, you will see โ€œ?$filter=UpdatedOn ge 2022-10-19T01:00:00Zโ€ that will fetch OrderItem records updated after the 19thย of October 2022. This is straightforward, except that the parameter we use to filter the date will remain fixed for now. We will explain how to make it dynamic later.

Add a new query that will use these Parameters to filter your records. Go to Data โ†’ From Other Sources โ†’ Blank Query. Name the Query as FilteredOrderItem. Click on Advanced Editor and add the script below.

let
    BaseUrl = "https://hubwayconnect.com/HW/rest/odata/sales-order",
    EntityName = "OrderItem",
    FilterColumn = "UpdatedOn",
    FilterOperator = "ge",
    FilterValue = "2022-10-19T01:00:00Z",
    RecordsPerPage = 1000,
    Filter = "$filter="& FilterColumn &" "&FilterOperator&" "&FilterValue,

    GetJson = (BaseUrl) =>
        let Options = [
        Headers = [
            #"accept" = "application/json"]
        ],
            RawData = Web.Contents(BaseUrl, Options),
            Json    = Json.Document(RawData)
        in  Json,

    GetTextData = (Url) =>
        let options = [ 
        Headers = [ 
            #"accept" = "application/json"] 
        ],
            RawData = Web.Contents(BaseUrl, Options),
            TextData = Text.FromBinary(RawData)
        in TextData,
    
    GetRecordCount = () =>
        let Url   = BaseUrl & "/$count",
            TextData= GetTextData(Url),
            Count = Number.FromText(TextData)
        in  Count,

    GetPage = (Index) =>
        let
            Skip  = "?$skip=" & Text.From(Index * RecordsPerPage),
            Url   = BaseUrl & "/" & EntityName & Skip & "&" &Filter,
            Json  = GetJson(Url),
            Value = Json[#"value"]
        in  Value,

    RecordCount = List.Max({ RecordsPerPage, GetRecordCount() }),
    PageCount   = Number.RoundUp(RecordCount / RecordsPerPage),
    PageIndices = { 0 .. PageCount - 1 },
    Pages       = List.Transform(PageIndices, each GetPage(_)),
    Records    = List.Union(Pages),
    Table       = Table.FromList(Records, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(Table, "Column1",
{"Id", "OrderId", "ProductId", "Quantity", "Discount", "Amount", "CreatedOn"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1",{{"Amount", type number}})
in #"Changed Type"

Note that we have combined a filter by date with a pagination strategy that was discussed previously. This will combine the best of both approaches.

Follow the remaining steps mentioned above to add FilteredOrderItem to the model, establish relationships with other Entities and build Pivot Table reports.

The output will return only the records that were updated on or after the date specified in the filter criteria.

Example of a chart displaying a filtered dataset
date specified in the filter criteria.

Setting a dynamic parameter in Excel

You can make your Excel report more sophisticated by configuring the filter values as parameters. You can also use Excel formulas to derive the date. This will give you the flexibility to change the parameter values dynamically without having to modify the script. This is very handy for Business Analysts and Managers who want to run a report every week to access the latest data.

In the example below, we will maintain the Hubway Service link, Entity Name and filter details in an Excel worksheet. We will also use Excel formulas to dynamically get the date which is 7 days prior to the current date. We will then use these parameters in the Power Query to filter the records from Hubway service.

Create a new Excel worksheet and add the following values:

Parameter name Parameter value
ServiceURL https://www.hubwayconnect.com/HW/rest/odata/sales-order
EntityName OrderItem
FilterColumn UpdatedOn
FilterOperator ge
FilterValue =TEXT(TODAY()-7,โ€YYYY-MM-DDโ€)
RecordsPerPage 1000

Go to Insert โ†’ Table and select the cell range and click OK. This will add this cell range as a Table. You can name the table as โ€œParametersโ€ in the Table Design Menu โ†’ Properties section.

Table Range
Table Range

Table Name
Table Name

The next step is to add these Parameters to the Excel Data Model. Select the cell range which contains the parameters. Go to Data โ†’ Get Data โ†’ From Other Sources โ†’ From Table/Range.

This will open the Power Query Editor. Here you will see that a new table called Parameters is added to the Model.

Next, add a new query that will use these Parameters to filter the records. Go to Data โ†’ From Other Sources โ†’ Blank Query. Name it โ€œParameterizedOrderItemโ€. Click on Advanced Editor and add the script below.

let
    BaseUrl = Parameters{[ParameterName="ServiceURL"]}[ParameterValue],
    EntityName = Parameters{[ParameterName="EntityName"]}[ParameterValue],
    FilterColumn = Parameters{[ParameterName="FilterColumn"]}[ParameterValue],
    FilterOperator = Parameters{[ParameterName="FilterOperator"]}[ParameterValue],
    FilterValue = Parameters{[ParameterName="FilterValue"]}[ParameterValue],
    RecordsPerPage = 1000,
    Filter = "$filter="& FilterColumn &" "&FilterOperator&" "&FilterValue,

    GetJson = (BaseUrl) =>
        let Options = [
        Headers = [
          #"accept" = "application/json"
        ]
      ],
            RawData = Web.Contents(BaseUrl, Options),
            Json    = Json.Document(RawData)
        in  Json,

    GetTextData = (Url) =>
        let options = [ 
        Headers = [ 
            #"accept" = "application/json"] 
        ],
            RawData = Web.Contents(BaseUrl, Options),
            TextData = Text.FromBinary(RawData)
        in TextData,

    GetRecordCount = () => 
        let Url = BaseUrl & "/$count", 
            TextData= GetTextData(Url), 
            Count = Number.FromText(TextData) 
        in Count,

    GetPage = (Index) =>
        let
            Skip  = "?$skip=" & Text.From(Index * RecordsPerPage),
            Url   = BaseUrl & "/" & EntityName & Skip & "&" &Filter,
            Json  = GetJson(Url),
            Value = Json[#"value"]
        in  Value,

    RecordCount = List.Max({ RecordsPerPage, GetRecordCount() }),
    PageCount   = Number.RoundUp(RecordCount / RecordsPerPage),
    PageIndices = { 0 .. PageCount - 1 },
    Pages       = List.Transform(PageIndices, each GetPage(_)),
    Records    = List.Union(Pages),
    Table       = Table.FromList(Records, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(Table, "Column1",
{"Id", "OrderId", "ProductId", "Quantity", "Discount", "Amount", "CreatedOn"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1",{{"Amount", type number}})
in #"Changed Type"

Click on the Home menu โ†’ Close and Load.

Microsoft may display aย Firewall error, in which case you will have to Ignore the Privacy settings by going to Data โ†’ Queries and Connections. Double-click any Table name to open the Power Query Editor. File โ†’ Options and Settings โ†’ Query Options โ†’ Privacy. Click โ€œIgnore the Privacy Levels and potentially Improve Performanceโ€.

Query options
Query options

Follow the remaining steps mentioned inย Pagination for large datasets, so you can add the ParameterizedOrderItem to the model, establish relationships with other Entities and build Pivot Table reports.

Example of a chart displaying a filtered dataset by a filter dynamically set
Example of a chart displaying a filtered dataset by a filter dynamically set

To fetch the OrderItems which were updated on or after 2022-01-01, you can just change the FilterValue cell and refresh the Pivot Table report. Hubway Connect will apply the filter and fetch the filtered records from OutSystems to Excel.

And voi la!

We demonstrated;

  • how to paginate through a large data set and fetch records in multiple iterations
  • how to apply OData filters to limit the output based on our business needs in order to implement powerful incremental synchronisation processes
  • the parameters in the Excel worksheet dynamic (bonus content!)

Data Analysts can filter data from the data source (for example, Status = Approved) and optimise the performance of data retrieval. By applying a date range filter, Data Stewarts can achieve incremental data load.

Senior Managers and Sales Analysts can build Excel charts and analyse the sales trend over a specific date range or for a specific product category.

The simplicity of OData filters loaded with the power of Hubway Connect helps you build flexible reports in Excel using the data from OutSystemsโ€ฆ all of this in a no-code style.

Was this article helpful?

Related Articles