Link2Feed Database Access Suite

This document outlines the technical details of the Database Access Suite as introduced in https://link2feed.atlassian.net/wiki/spaces/LDM/pages/1955725464

What is the Database Access Suite?

The core mechanism behind this suite is providing access via a TLS/SSL-secured connection to a set of secure MySQL/MariaDB views on a read-replica of the Link2Feed Intake database systems. The connection, data formatting, and access is all managed through existing MySQL standards that can connect seamlessly into a large number of data tools.

Data is available either anonymized, or fully identified. Access security is managed through MySQL standard tools and an IP address allow-list.

The system is useful as either a fully queryable data access tool, or as a base point for a synchronization system.

How Do I Access It?

Reach out to your Link2Feed Account manager, who will work to amend your current contract and coordinate with the Link2Feed support team to do the initial setup.

The Link2Feed support team will provide connection details including an access endpoint URL, a username and a password.

Documentation

Entity Relationship Diagram (ERD)

These documents are a series of Entity Relationship Diagrams based on the MySQL VIEWs that the Database Access Suite provides. This is available in both PNG format as a large image, and also in a MySQL Workbench format (a free software).

MySQL Workbench can be downloaded at: https://dev.mysql.com/downloads/workbench/

These sample files have the prefix hfb_ but in a test or production environment these prefixes would be different. All other parts of the table names would remain the same.

MySQL DDL Queries for Creating Replica Datastore

This document is a series of MySQL DDL queries for creating tables that mirror the format and relationship of the database views that the access suite can reach. It includes a prefix on each table name. This mirrors our deployment pattern, where each client is given a short code to represent their food bank (eg- St. Mark’s Food Bank might be shortened as smfb). In the sample file, this prefix is l2f_test_, and a new code would be provided for your food bank in its place.

 

Database Fields Summary

Note:  Table names are marked with a sample prefix. This prefix varies per-client. For example: Acme Food Bank may be given tables named afb_program_visit_activities, where afb is the prefix. In these examples, for consistency, the prefix is l2f_test_.

Core Visit Tables (Every Recorded Visit Creates These)

l2f_test_program_visit_activities

  • The core record of visits. This is the trunk of the tree.

  • Many of the following will have a program_visit_activity_id field which refers back to the id field of this table.

  • On anonymized solution, would convert client_id to a one-way-hashed alphanumeric string to allow uniquely identifying clients, but not tying them to their visible L2F IDs.

  • These records include a is_completed field that is set to true only when a visit has been fully completed. The data may include scheduled appointments that a client never showed up for. (eg- is_completed = false, is_appointment = true).

  • These records include a total_members column. This is NOT a count of the members of the household that participated in the visit, but instead the number of members in the household as a whole. For example:

    • Link2Feed has the ability in the UI to select which clients in a household were served, in this case there may be three l2f_test_program_visit_household_members records tied to this program_visit_activity_id, but the total_members field may have the value 5. This is not an error, it simply states that “This is a household of 5, but only 3 people participated in this visit.”

    • There are also some programs (like CSFP) which allow only one member of the household to be the visitor of record for each visit. In this case, there will only be one l2f_test_program_visit_household_members record tied to this program_visit_activity_id, but other members of the household may be accounted in total_members. In a household of two seniors, each receiving CSFP services, there may well be two l2f_test_program_visit_activities records for the same day, with the same household_id, but different client_id, and each only having one l2f_test_program_visit_household_members associated record, but a total_members count of 2. This does not mean 4 people visited, It meant that the same household of 2 people was served twice: the two seniors living together.

  • Some of these records will be missing program_id, module_id, and some other details. These records are used in our system to tie to a survey or assessment response so that demographic details of survey and assessment responders can be included in the survey and assessment exports. We record these in a way similar to a “visit snapshot” so that they export easily, but these are not actual program visits in the same way that someone receiving food is a program visit, these are used by the survey and assessment systems. In the future, if we expose our survey and assessment systems to the API layer, this link will be more clear, and this data will be more useful, but for now it can effectively be ignored.

 

l2f_test_program_visit_household_addresses

  • A record of the household's address.

  • Several choices for anonymization here. Can remove the record entirely. Or remove address_line_1, address_line_2, latitude and longitude. Or keep latitude and longitude but only to 2 or 3 decimal places.

  • The field names of these records reflect only one regional preference. For convenience’s sake, this is the mapping for those fields:

    • state: State or Province

    • ward: County or Borough

    • postcode: Zip Code or Postal Code

 

l2f_test_program_visit_household_members

  • The core record of visitors. The major branches of the tree.

  • Has both a client_id which uniquely identifies the person in the system, and an id field itself. So I might be Client 1234 (client_id), but I am Visitor 234567 (id).

  • Would remove first_name, last_name, middle_name and date_of_birth fields on anonymized solution.

  • Several of the following tables refer back to this table’s id column either as program_visit_household_member_id or as household_member_id.

  • On anonymized solution, would convert client_id to a one-way-hashed alphanumeric string to allow uniquely identifying clients, but not tying them to their visible L2F IDs.

 

l2f_test_program_visit_household_dietary_considerations

  • Household dietary considerations tab selections. Refers to l2f_test_data_types through dietary_consideration_id.

  • NOTE: Everyone in the household is considered to have the same dietary considerations, so that, for example, if Client 1 has an allergy, but Client 2 picks up the food, that the allergy appears in the profile of Client 2. So in a household of 4, if one has a nut allergy, there will be four records in this table for that same nut allergy.

 

l2f_test_program_visit_household_expenses

  • Household expenses selections, if recorded for the household. Refers to the l2f_test_data_types table through expense_type_id.

 

l2f_test_program_visit_household_income

  • Individual income selections. Refers to l2f_test_data_types through income_type_id, and refers to l2f_test_program_visit_household_members through household_member_id.

 

l2f_test_program_visit_household_languages

  • Household language selections. Refers to l2f_test_languages through language_id.

 

l2f_test_program_visit_household_member_contact_info

  • Individual phone numbers and email addresses. Refers to l2f_test_program_visit_household_members through household_member_id.

 

l2f_test_program_visit_household_member_ethnicity_types

  • Individual Ethnicity choices. Refers to l2f_test_data_types through ethnicity_type_id.

 

l2f_test_program_visit_household_member_self_identity_types

  • Individual Self-Identifies As (Disabled, Veteran, Pregnant, etc.) choices. Refers to l2f_test_data_types through self_identity_type_id.

 

l2f_test_program_visit_household_member_social_program_types

  • Individual Social Programs (SNAP, WIC, Medicaid, etc.) choices. Refers to l2f_test_data_types through social_program_type_id

  • (DEPRECATED) - This field was deprecated, and holds historical information, but does not reflect changes to client profiles.

 

l2f_test_program_visit_household_social_program_types

  • Household Social Programs (SNAP, WIC, Medicaid, etc.) choices. Refers to l2f_test_data_types through social_program_type_id

Core Metadata (Every Recorded Visit References These)

l2f_test_countries

  • Static list of countries, used with address records.

 

l2f_test_data_types

  • Dynamic list of data entry options created by L2F or users (eg- Self-Identifies As, Ethnicities, Expenses, etc.)

  • May have a parent type, which is useful for cross-network standardization. Refers to l2f_test_data_types on parent_id.

 

l2f_test_languages

  • Static list of languages that households speak.

 

l2f_test_modules

  • Configuration table used to identify different L2F system components/tools, for example the difference between pantry and CSFP.

  • Referred to both by l2f_test_program_visit_activities and l2f_test_programs on the module_id field.

 

l2f_test_organizations

  • Agencies, Networks, Regional and National organizations. Referred to in a number of tables by organization_id.

 

l2f_test_programs

  • Configuration table used to identify the "New Visit" button pressed by the user, for example "New CSFP Visit" vs "New TEFAP Visit".

  • Referred to by l2f_test_program_visit_activities on the program_id field. The type field is used to group common programs together (eg- all tefap_food_bank visits).

 

l2f_test_provisions

  • Dynamic list of foods, items and services provided on visit records.

 

l2f_test_workers

  • Workers who log in and utilize the system. Referred to by a great number of tables on the created_by and updated_by fields referring to the guid field of this table.

Non-Core Tables (Not Every Visit Creates These)

l2f_test_client_signature_records

  • Records of signatures for visits. 

 

l2f_test_household_notes

  • Notes recorded during a visit.

 

l2f_test_program_bulk_log

  • Anonymous, bulk records.

    • The properties field includes several pieces of useful data, however its format has changed over the years.

      • types

        • A count of Adult, Senior and Child participants. The oldest records will include this data.

      • provisions

        • A mapping of provision IDs (ie- IDs on the data_types table) to a value.

          • Prior to 2022-09-18, provisions on bulk programs were only provided as checkboxes. (eg- You could check “Bread” but not say “10 Bread”). Data prior to 2022-09-18 will present this as, for example, {"1234":"Bread"} which represents “Bread was involved with this visit.”

          • After 2022-09-18, provisions on bulk programs could include numerical values OR checkboxes. Data after 2022-09-18 represents the value as a count, rather than the name of the provision, for example, {"1234":"10"} which represents “10 items of provision 1234 were involved with this visit.” When we look up provision 1234, we will find it is “Bread”. So 10 Bread were involved.

      • counters

        • A mapping of counter IDs (ie- IDs on the data_types table) to a count. Counters were added later to the system to accommodate recording bulk values that were not Adults, Seniors or Children.

          • For example "counters":{"12345":"900"} represents that there were 900 of Counter 12345 used on the visit. When we look up counter 12345, we may find that this is 900 Households.

      • date

        • This is the date that the visit was recorded FOR. This is distinct from the date the visit was recorded AT, which is held in the created_at column. For example, it’s possible to record on June 25th for a bulk event that happened on June 10th – in this case created_at would be June 25th, but date would be June 10th.

      • items_other and food_other

        • These hold the typed values entered into the box labelled “Provide details of other Food Items supplied on this visit” or “Provide details of other Non-Food Items supplied on this visit”.

      • notes

        • This holds the value of any notes typed into the notes field.

    • The records field is a summative count of all types and counters included in the visit, for example a bulk record of 20 Adults and 10 Households would have 30 records.

 

l2f_test_program_visit_activity_properties

  • Extra properties. Primarily used to fill out "other specify" types of provisions.

 

l2f_test_program_visit_activity_provisions

  • Food, Items and Services provided during a service. Refers to l2f_test_provisions through provision_id.

 

l2f_test_program_visit_activity_referrals

  • Selections of "Referred To" options on visits.

 

l2f_test_program_visit_activity_signature_records

  • More metadata for signatures for visits.

 

l2f_test_program_visit_household_notes

  • References specific visits to specific notes.

 

l2f_test_program_visit_signature_records

  • Older signature solution. No longer used, but contains historical data.

 

l2f_test_program_visit_visit_reasons

  • Visit reasons recorded with each visit. 

Very Specific Modules (Very Few Visits Create These)

l2f_test_program_visit_activity_delivery_details

  • New delivery system.

 

l2f_test_program_visit_activity_external_referrals

  • Ontario 211 Referral Integration.

 

l2f_test_program_visit_ffc_info

  • Houston Food for Change Program.

 

l2f_test_program_visit_household_member_school_attendance

  • School Pantry module.

 

l2f_test_program_visit_voucher_transactions

  • Voucher module.

 

l2f_test_program_visit_vouchers

  • Voucher module.

 

l2f_test_schools

  • School Pantry module.

Incremental Syncs and the Hard Delete Log

All entries in the Database Access Suite include a record for updated_at which can be used to do incremental pulls. For example, SELECT * FROM l2f_test_program_visit_activities WHERE updated_at > '2022-01-01' will pull only those records updated since Jan 1st, 2022.

However, the Link2Feed system was originally built with hard-deletes (eg- a visit recorded in error is removed entirely from the database when it is deleted by a user.) This leads to gaps and sync issues when pulling incremental data from the system, as there is no deleted_at or updated_at for these records, they are simply gone.

To provide insight into this process, Link2Feed has given each Database Access Suite user access to the hard_delete_log table, note that this table does not have a prefix. That table has the following columns:

  • table_name: the text name of the table, not-including the prefix (eg- program_visit_activities)

  • id: the ID reference within that table.

    • Note: for some many-to-many bridge tables there is not a single ID, but one of program_visit_activity_id or program_visit_household_member_id. In this case, the id column of hard_delete_log refers to the program_visit_activity_id or program_visit_household_member_id respectively. For example:

      • In l2f_test_program_visit_household_notes there is no id column, but there is a program_visit_activity_id column.

      • If a hard_delete_log entry has id of 12345 and table_name of “program_visit_household_notes” it means that all l2f_program_visit_household_notes with program_visit_activity_id 12345 have been deleted.

  • deleted_at: the Date and Time when the entry was deleted. This allows for incremental checks of deletions.

This allows for two processes to be setup:

  1. An incremental pull of every table, once per day, for example:

    SELECT * FROM l2f_test_program_visit_activities WHERE updated_at > DATE_SUB(NOW(), INTERVAL 1 DAY);
  2. An incremental check of records on that table that have been deleted:

    SELECT CONCAT('l2f_test', table_name) AS table_name, id, deleted_at FROM hard_delete_log WHERE table_name = 'program_visit_activities' AND deleted_at > DATE_SUB(NOW(), INTERVAL 1 DAY);

Sample Queries Against Database Access Suite

The “trunk” of the tree is the l2f_test_program_visit_activities table. Each record in this table represents a single row in the Link2Feed exports that your team has been receiving from the group.

 

Attached to each of those are a number of l2f_test_program_visit_household_members records, referring back by program_visit_activity_id, and these represent each of the clients. One of these records will be for the main client, which can be identified as their `client_id` matches the `client_id` of the l2f_test_program_visit_activities record.

 

There are also several tables like l2f_test_program_visit_household_member_ethnicity_types (and _self_identity_types, etc) which are demographic fields tied to individuals. They link to both an individual on a visit by a program_visit_household_member_id and also to an ethnicity_type_id that links over to the l2f_test_data_types table.

These relationships can be gleaned from the SQL and ERD documents, but I admit it’s hard to separate out the important bits.

So let’s look at a real example: finding how many duplicated clients with Hispanic/Latino heritage visited in 2020.

That would be:

SELECT COUNT(*) AS number_of_latino_visitors FROM l2f_test_program_visit_household_member_ethnicity_types AS pvhmet LEFT JOIN l2f_test_data_types AS dt   ON pvhmet.ethnicity_type_id = dt.id LEFT JOIN l2f_test_program_visit_household_members AS pvhm   ON pvhmet.program_visit_household_member_id = pvhm.id LEFT JOIN l2f_test_program_visit_activities AS pva   ON pvhm.program_visit_activity_id = pva.id WHERE dt.name = 'hispanic_latino'   AND pva.visit_date BETWEEN '2020-01-01' AND '2020-12-31'; 

And then let’s say we want the unique count, we would pull a COUNT(DISTINCT pvhm.client_id) instead. The client_id field is the consistent unique identifier for clients. That would be:

Digging deeper, let’s say we wanted to group those visits into how many clients are in the household. This is where we get a little grey-area and domain-specific. There are two important numbers that Link2Feed tracks: 1) the number of people who were actually recorded as part of the visit, and checked off as attending (which some of our users utilize very effectively, and some do not) and 2) the number of people who live in the household, whether or not they are part of the visit. #1 is grouping by how many l2f_test_program_visit_household_members records relate to the visit. #2 is grouping by the l2f_test_program_visit_activities.total_members field.

So that is either:

#1 – based on how many visitors were on the visit:

Or #2 – based on how many people live in the visitors’ household:

The second statistic is much more useful for CSFP visits, as CSFP visits are recorded with only one individual on them, so the only way to get those household details on CSFP visits is to use the total_members field.

On that note: One of the things both Reports and Exports within Link2Feed’s Intake system do is allow restricting by program type. To do that, we need to link in the l2f_test_programs table and restrict on its type field.

Video Demonstrations

Connecting with MySQL Workbench and MySQL Command Line Interface (CLI)

Connecting with DBeaver

Connecting and Using Tableau Desktop

Connecting with Azure Data Factory