Celgene Associate Director, Data Architect in Summit, New Jersey


Celgene is a global biopharmaceutical company leading the way in medical innovation to help patients live longer, better lives. Our purpose as a company is to discover and develop therapies that will change the course of human health. We value our passion for patients, quest for innovation, spirit of independence and love of challenge. With a presence in more than 70 countries - and growing - we look for talented people to grow our business, advance our science and contribute to our unique culture.


Celgene has established a Big Data capability that provides actionable insights and informs decisions throughout the product life cycle and helps improve patient lives.

Celgene’s modern Big Data Platform includes a Data Lake that stores and provide easy and secured access to data needed by various functions for reporting and analytics.

The Data Architect is responsible for the Big Data information architecture including data sourcing, data flows, data standards, taxonomies/ontologies, mapping of data to common/industry data models as well as creating common data models for various subject areas to support cross-functional analytical use cases.

The role works closely with the Business Data Stewards, the Data Lake Manager and the team responsible for data ingestions/integration.

Responsibilities include, but are not limited to, the following:

•Define requirements for the ingestion of new data sources including lifecycle, data quality check, transformations and metadata enrichment

•Define data transformation rules for integrating data sources into common / standard data models e.g. OMOP

•Develop and iteratively evolve common data models for subject areas to anticipate and satisfy multiple cross-functional use case while meeting desired performance and scalability

•Create an inventory of all fit for purpose data marts/repositories being created in the big data platform. Provide oversight from data modeling and data architecture perspective to establish authoritative sources by subject areas and promote reuse.

•Analyze legacy data pipelines and assist team with migrating them to the Big Data platform

•Create/drive adoption of data naming conventions, data definition (i.e. business glossaries), dictionaries, taxonomy/ontologies and standards for data structures used the Big Data environment

•Define data sourcing approach and identify proper data source systems / authoritative sources

•Ensure appropriate processes and automation are in place to manage quality and lifecycle of the data and metadata

•Ensure data lineage and metadata is easily available to user in the Big Data Platform and in the Enterprise Metadata Management solution

•Ensure logical and physical data models are available and up-to-date for the data integrated into the platform

•Regularly review data quality reports and recommend remediation if necessary

•Provide subject-matter-expertise to Data Scientists, Data Lake Manager, Data Engineers and Solution Architects who want to leverage data available in the Data Lake

•Provide oversight/consultancy to tenants for the Big Data Platform that are deploying new workloads/solutions and participate to architecture governance process


•Span of Control – Global, cross functional, Big Data Analytics platform

•Direct Reports – None

•Indirect Reports – will direct the work of implementation partners and a matrix group within IT of approximately 1-5

•Budgetary Responsibility – None

•Interacts w/ - Data Scientists, Business Data Steward, Data Engineer, Solution Architect, External Vendors



Skills/Knowledge Required:

§ Bachelor's degree in computer science, system analysis or a related study, or equivalent experience

§ Minimum of 5-7 years working in an Information Management (or equivalent) roles preferably in a global biotech/pharmaceutical organization, focusing on the following disciplines:

o Information / Data Architecture

o Data Modeling – specifically to support data analysis and reporting in big data environments

o Data Quality Management including Data Profiling (Informatica Data Quality tools experience a plus)

o Data Analysis

§ Minimum of 5-7 years of combined hands-on experience with the following techniques and tools:

o Various data modeling types – conceptual, logical, relational, dimensional, canonical (messaging, XML, JSON), document – especially with NoSQL databases – and experience with data modeling tools (ER Studio experience a plus)

o Ability and willingness to be hands-on including use of SQL, ETL tools, scripts, etc. to tune performance, data transformation routines, etc. when necessary.

o Creating data flow documentation and business process modeling

o Data integration design and development

o Leveraging a Big Data capability and understanding the schema-on-read approach and implications

o Metadata Management (Especially experience with ASG Rochade, Cloudera Navigator, Hive Metastore)


§ Other required experience and attributes:

o Experience with working in a matrixed organization

o Ability to quickly learn new technologies

o Excellent interpersonal skills in areas such as teamwork, influence, facilitation and negotiation

o Problem solver with having demonstrated the ability to “think out of the box”.

o Strong written and verbal communication skills. Explain complex technical issues in simple, business-friendly language

o Excellent planning and organizational skills

o Demonstrated ability to work well with others and be respected as a leader

o Capability for being thoughtful, extroverted and collaborative

o Motivation that is focused on long-term results

§ Experience in the following areas is a plus:

o Understanding of Data Science activities and lifecycle

o Knowledge of key Bio-Pharmaceutical Industry data standards e.g. OMAP, CDISC SDTM, HL7, etc, and key data providers e.g. Flatiron, Truven, Optum, IMS Health

Celgene is committed to equal opportunity in the terms and conditions of employment for all employees and job applicants without regard to race, color, religion, sex, sexual orientation, age, gender identity or gender expression, national origin, disability or veteran status.

Celgene complies with all applicable national, state and local laws governing nondiscrimination in employment as well as employment eligibility verification requirements of the Immigration and Nationality Act. All applicants must have authorization to work for Celgene in the U.S.

Associate Director, Data Architect

Location: Summit, NJ, US

Job ID: 17001172