Updated Architecture-and-Frameworks-.txt

Jun Matsushita authored
revision 26d07684f91d97ed6da4c943b069f8694e9500e6
Architecture-and-Frameworks-
# Mission v0.1

## Functional

- Request
- Request Research - Text Box
- (Monitor country searches)

- Collection
- Create Companies - Registered name / Open Corporate ID / Jurisdisction.. Upload Documents
- Create Ownership Link
- Accessing
- Querying - Results on Graph
- Navigate
- Company Data - Side box / Sections
- Link Data - Edge data (maybe tabs)
- Map
- Analysing
- Template Searches
- Graph by Country
- Size : Number of connections, Turnover,
- Filter upward / downward
- Autozoom
- Publishing
- Download data set (CSV? JSON,? LD-CSV?)
- Access control / User registration
- Print current view

## Technical

- Database is Neo4j
- Framework/CMS: structr
- Visualisation: d3
- Supported OC API subset
- GET companies/:jurisdiction_code/:company_number
- GET companies/search
- GET companies/:jurisdiction_code/:company_number/filings
- GET companies/:jurisdiction_code/:company_number/network
- GET companies/:jurisdiction_code/:company_number/statements
- GET placeholder/:id
- GET placeholders/:id/network
- GET placeholders/:id/statements

+ Contracts / Supply Chain??

# Data Model

## How OC Models the Data

https://blog.opencorporates.com/2014/01/08/understanding-corporate-networks-part-4-how-we-record-the-data/

In the core OpenCorporates database, every bit of data can be described as a:

STATEMENT about one or more COMPANIES or PLACEHOLDERS, each with a PROVENANCE

### Statements

* OC Statement
* The data point. In the example above, this is “There is a subsidiary relationship that existed on December 31, 2012″. The data point is derived directly from information in the primary source.
* The subject company or placeholder. In the example above, this is Facebook, Inc.
* The object company or placeholder. In this case, Vitesse, LLC.
* The verbs linking the respective companies to the data point. In this case, Facebook, Inc “has a subsidiary”, and Vitesse, LLC “is a subsidiary”. Internally, we call these “placeholder data links“


* OC Statements about Networks
* Subsidiaries: statements that X is a subsidiary of Y
* Share Holdings: statements that X holds shares in Y
* Acquisitions: statements that X acquired Y. Often these statements are derived from press releases, so are not considered as reliable as other kinds of statements.
* Branches: entities permitted to operate in a jurisdiction but with a legal personality registered elsewhere.

https://opencorporates.com/companies/us_de/3835815/statements
![Statements](images/screen-shot-2014-06-12-at-065217.png)

### Provenance

![Provenance Object](images/screen-shot-2014-06-12-at-064313.png)

The provenance object is gradually replacing the source object, and provides a greater granularity of data, as well as additional attributes such as confidence and, sometimes, log messages.

### Source

![Source object](images/screen-shot-2014-06-12-at-064401.png)

## OC API

GET versions
GET companies/:jurisdiction_code/:company_number
GET companies/search
GET companies/:jurisdiction_code/:company_number/filings
GET companies/:jurisdiction_code/:company_number/network
GET companies/:jurisdiction_code/:company_number/statements
GET companies/:jurisdiction_code/:company_number/data
GET officers/search
GET officers/:id
GET corporate_groupings/:name
GET corporate_groupings/search
GET filings/:id
GET data/:id
GET statements/:id
GET placeholder/:id
GET placeholders/:id/network
GET placeholders/:id/statements
GET account_status

## Linked Open Data

### Open Provenance

http://openprovenance.org/model/opmx#example

![](http://openprovenance.org/model/dependencies.jpg)


# Data Model Roadmapping

## Notes about current data

Use ISO 8601 for dates.
Timestamps for Companies are difficult if we want to avoid dealing with multiple nodes. Maybe an calculated (or programatically updated) last_updated would be useful?

### BP

What is the Status column?

### Nigeria

Only one data point for Date Acquired: ignoring for now.
Notes on Ownership seems to contain an "ownership_type"
Employees only has a couple of data points: ignoring for now.

## v0.1a

In this model, Nodes are seen as facts and relationships have inline provenance annotations.
Data about sources is ignored for now for Company properties address, key people and foundation year.
Structr provides a number of workflow related default properties for all types (createdDate, lastModifiedDate, createdBy, Owner,...)

### Nodes

CREATE (parent:Company { name: 'BP Exploration Operating Company Limited'})
CREATE (company:Company { name: 'BP Exploration (Epsilon) Limited', oc_id: 'gb/01004984', headquarters:'', people:'', founded_in:'1992', website:''})
CREATE (contractor:Company { name: 'Hamilton Technologies Limited'})
CREATE (juris:Country { name: 'Bahamas'})
CREATE (llc:CompanyType { name:'Limited Liability Company'})
CREATE (contract:Contract { name: 'Supply of bulk methanol', description:'', value_usd:'', value_currency:'10MEUR', announced_at:'2012-08', started_at:'2013', ended_at: '', duration_months:'48', field:'', license_area:''})

### Relationships

#### Factual

CREATE (parent)-[owns:IS_OWNER { immediate:'100', ultimate:'100', ownership_type:'', source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(company)

CREATE (company)-[jur:HAS_JURISDICTION {source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(juris)

CREATE (company)-[contracts:CONTRACTS {source_url: 'http://www.nestoilgroup.com/projects.php', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(contract)

CREATE (contract)-[has_contr:HAS_CONTRACTOR {source_url: 'http://www.nestoilgroup.com/projects.php', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(contractor)

CREATE (company)-[founded:HAS_TYPE {source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(llc)

#### Cardinality

Company N -[:IS_OWNER]-> N Company
Company N -[:HAS_JURISDICTION> 1 Jurisdiction
Company N -[:IS_FOUNDED]-> 1 Year
Company N -[:HAS_TYPE]-> 1 Type

### Notes

In this model, provenance info is not expressed about companies. On the other hand, all relationship data is tagged with provenance data.
![screen-shot-2014-06-19-at-143308.png](images/screen-shot-2014-06-19-at-143308.png)

### Grass file

node {
diameter: 40px;
color: #DFE1E3;
border-color: #D4D6D7;
border-width: 2px;
text-color-internal: #000000;
caption: '{id}';
font-size: 10px;
}

relationship {
color: #D4D6D7;
shaft-width: 1px;
font-size: 8px;
padding: 3px;
text-color-external: #000000;
text-color-internal: #FFFFFF;
}

node.Company {
color: #F25A29;
border-color: #DC4717;
text-color-internal: #FFFFFF;
diameter: 80px;
border-width: 2px;
caption: '{name}';
font-size: 10px;
}

node.Country {
color: #AD62CE;
border-color: #9453B1;
text-color-internal: #FFFFFF;
diameter: 80px;
border-width: 2px;
caption: '{name}';
font-size: 10px;
}

node.Confidence {
color: #FCC940;
border-color: #F3BA25;
text-color-internal: #000000;
diameter: 40px;
border-width: 2px;
caption: '{level}';
font-size: 10px;
}

node.CompanyType {
color: #30B6AF;
border-color: #46A39E;
text-color-internal: #FFFFFF;
diameter: 50px;
border-width: 2px;
caption: '{name}';
font-size: 10px;
}

node.Contract {
color: #FF6C7C;
border-color: #EB5D6C;
text-color-internal: #FFFFFF;
diameter: 80px;
border-width: 2px;
caption: '{name}';
font-size: 10px;
}

node.ContractType {
color: #4356C0;
border-color: #3445A2;
text-color-internal: #FFFFFF;
diameter: 40px;
border-width: 2px;
caption: '{name}';
font-size: 10px;
}

node.Document {
color: #DFE1E3;
border-color: #D4D6D7;
text-color-internal: #000000;
diameter: 40px;
border-width: 2px;
caption: '{name}';
font-size: 10px;
}


### Nodes

CREATE (parent:Company { name: 'BP Exploration Operating Company Limited'})
CREATE (statecompany:Company { name: 'NNPC Nigeria'})
CREATE (company:Company { name: 'BP Exploration (Epsilon) Limited', aliases: 'BP Exp, BP Exp LTD', oc_id: 'gb/01004984', headquarters:'', people:'', founded_in:'1992', website:'', document:''})
CREATE (contractor:Company { name: 'Hamilton Technologies Limited'})
CREATE (operator:Company { name: 'SNEPCO'})
CREATE (bah:Country { name: 'Bahamas'})
CREATE (nig:Country { name: 'Nigeria'})
CREATE (llc:CompanyType { name:'Limited Liability Company'})
CREATE (servicecontract:Contract { name: 'Supply of bulk methanol', official_title:'', description:'', value_usd:'', value_currency_amount:'10000000', value_currency_unit: 'EUR', announced_at:'2012-08', started_at:'2014', ended_at: '', duration_months:'48', field:'', license_area:''})
CREATE (statprodcontract:Contract { name: 'Nigeria OML 120 - 121', official_title:'', description:'', value_usd:'', value_currency_amount:'10000000', value_currency_unit: 'EUR', announced_at:'2012-08', started_at:'2014', ended_at: '', duration_months:'48', field:'', license_area:''})
CREATE (prodcontract:Contract { name: 'Another Nigeria Indirect Prod Contract', official_title:'', description:'', value_usd:'', value_currency_amount:'10000000', value_currency_unit: 'EUR', announced_at:'2012-08', started_at:'2014', ended_at: '', duration_months:'48', field:'', license_area:''})

/*
- Primary (who the giv is giving permission to extract something from planet earth): i.e. 1/ Production Sharing Contracts (maybe 2/ Concession - 3/ Primary Service Contract <> from Service Contract) linked to Territory. (Have ISSUES - Country can issue and State owned Company can issue - and HAS CONTRACTOR and HAS_OPERATOR relationships). Different Primary Contracts can be captured as Contract Type.

- Secondary (everything else).
- 4/Service Contract.
*/

CREATE (productionsharingtype:ContractType { name:'Production Sharing Contract'})
CREATE (concessioncontracttype:ContractType { name:'Concession Contract'})
CREATE (primaryservicecontracttype:ContractType { name:'Primary Service Contract'})
CREATE (servicecontracttype:ContractType { name:'Service Contract'})

CREATE (doc:Document { name:'', summary: '', raw:'', file:''})

//ContractType???
//Linking documents
CREATE (company)-[hasdoc:HAS_DOCUMENT]->(doc)
CREATE (servicecontract)-[hascontractdoc:HAS_DOCUMENT]->(doc)

// Relationships

// Factual

CREATE (parent)-[owns:IS_OWNER { immediate:'100', ultimate:'100', ownership_type:'', start_date:'', end_date:'', source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(company)

CREATE (company)-[jur:HAS_JURISDICTION {source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(bah)

// Company type

CREATE (company)-[hastype:HAS_TYPE {source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(llc)

// Service Contracts

CREATE (company)-[issuservice:ISSUES {source_url: 'http://www.nestoilgroup.com/projects.php', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(servicecontract)

CREATE (servicecontract)-[hascontr:HAS_CONTRACTOR {contract_share:'', source_url: 'http://www.nestoilgroup.com/projects.php', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(contractor)

CREATE (servicecontract)-[haservicescontracttype:CONTRACT_TYPE {source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'', log_message: ''}]->(servicecontracttype)

// Production Sharing Contracts

// Country state issuing.

CREATE (nig)-[issuprodstate:ISSUES {source_url: 'http://www.nestoilgroup.com/projects.php', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(prodcontract)

CREATE (statprodcontract)-[hasoper:HAS_OPERATOR {contract_share:'', source_url: 'http://www.nestoilgroup.com/projects.php', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(operator)

CREATE (statprodcontract)-[hasstateprodcontracttype:CONTRACT_TYPE {source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'', log_message: ''}]->(productionsharingtype)

// State owned company issueing

CREATE (nig)-[stateowns:IS_OWNER { immediate:'100', ultimate:'100', ownership_type:'', start_date:'', end_date:'', source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(statecompany)

CREATE (statecompany)-[issu:ISSUES {source_url: 'http://www.nestoilgroup.com/projects.php', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(prodcontract)

CREATE (prodcontract)-[hasprodcontracttype:CONTRACT_TYPE {source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '2014-05-15', confidence: 'high', source_type:'secondary', log_message: ''}]->(productionsharingtype)

#### Cardinality

Company N -[:IS_OWNER]-> N Company
Company N -[:HAS_JURISDICTION> 1 Jurisdiction
Company N -[:IS_FOUNDED]-> 1 Year
Company N -[:HAS_TYPE]-> 1 Type

### Notes

In this model, provenance info is not expressed about companies. On the other hand, all relationship data is tagged with provenance data.

Company Names
- Display Name (is calculated)
- name (Legal Name):
- other_names : Other Names: "ABDSA, BSDLK , AKJSDK"
- previous_names: "ASLDKADS"

Open Corporates:
- Provenance differences: Source_Type - Enumeration might probably be different.

Open Oil TODO:
- Date columns (announced_at:'2012-08', started_at:'2014', ended_at: '', duration_months:'48')
- Currency columns (amount, unit)
- Contract (name, title, description)
- Ownership
- People (New line separator)

Known Limitations
- People are not nodes
- We don't track provenance of HQ, Founded date, People.


## v0.1b - Provenance Model

In this model, provenance are Nodes (as in OC). This requires to have a "proto statement" model, where even ownership needs to be a node, in order to link the provenance to it (relationships cannot link other relationships - reification).

### Nodes

CREATE (company:Company { name: 'BP Exploration (Epsilon) Limited', oc_id: 'gb/01004984'})
CREATE (parent:Company { name: 'BP Exploration Operating Company Limited'})
CREATE (juris:Country { name: 'Bahamas'})
CREATE (ownership:Ownership { immediate:'100', ultimate:'100', ownership_type:''})
CREATE (year92:Year { year:'1992'})

### Relationships

#### Factual

CREATE (parent)-[owns:IS_OWNER]->(ownership)
CREATE (ownership)-[owned:IS_OWNED]->(company)
CREATE (company)-[jur:JURISDICTION]->(juris)
CREATE (company)-[founded:FOUNDED]->(year92)

#### Provenance

CREATE (provOC:Provenance { source_url: 'https://opencorporates.com/companies/gb/01004984', source_date: '15 Jun 2014', confidence: 'high', source_type:'secondary', created_by:'jun', log_message: '', created_at: '19/06/2014'})
CREATE (company)-[prov:PROVENANCE]->(provOC)

CREATE (provOwnership:Provenance { source_url: 'https://docs.google.com/file/d/0B5ORBm2amqZSY2NvOFQwbTZvemM', source_date: '07/06/2013', confidence: 'high', source_type:'external', created_by:'jun', log_message: '', created_at: '19/06/2014'})
CREATE (ownership)-[prov:PROVENANCE]->(provOwnership)

CREATE (contract)-[prov:PROVENANCE]->(provOwnership)

### Notes

Ownership is a node to allow linking it to a Provenance. (See Nigeria Company Ownership spreadsheet: ExxonMobil Corporation -> Mobil Producing Nigeria Unlimited)

## v0.2

Triplestore