reverse engineering for reuse - tu wien€¦ · solution: scan all the code in single, short...

52
Reverse Engineering for Reuse Univ.Prof. Dipl.-Ing. Dr. techn. Harald GALL Universität Zürich Technische Universität Wien

Upload: others

Post on 25-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Reverse Engineeringfor Reuse

Univ.Prof. Dipl.-Ing. Dr. techn.

Harald GALL

Universität ZürichTechnische Universität Wien

(c) 2006, H.Gall Reverse Engineering.2

Motivation für Reverse Engineering

• Existierende Software, die erfolgreich imEinsatz ist

• Hohe Investitionen in bestehende Software• Gute Lösungen

• Ziele:☞Reuse-Potential nutzen☞Herauslösen von Komponenten für Reuse☞Grad der Wiederverwendung erhöhen

(c) 2006, H.Gall Reverse Engineering.3

Reverse Engineering - Begriff

• Analyse eines existierenden Systems mit dem Ziel☞ Identifikation der Komponenten & deren Beziehungen

☞ Erzeugung von Darstellungen auf höheremAbstraktionsniveau

• keine Änderung des Systems!• bezieht sich auf alle Phase des Software Life-Cycles• Subprozesse:

☞ Redokumentation

☞ Design Recovery

(c) 2006, H.Gall Reverse Engineering.4

Redokumentation

• Erzeugung oder Überarbeitung vonsemantisch äquivalenter Repräsentationen desSystems auf demselben Abstraktionsniveau

• z.B.☞Redokumentation des Source Codes (Code -

Code)

☞Redokumentation des Designs (Design - Design)

(c) 2006, H.Gall Reverse Engineering.5

Design Recovery

• Zusätzliche Information (Wissen über das Systemund seinen Anwendungsbereich) wird zurGenerierung von Abstraktionen herangezogen.

• Repräsentationen:☞ Datenflussdiagramme, Kontrollflussdiagramme☞ informale Beschreibungen des Software-Systems und

seiner Domäne (Diagramme, Text etc.)

• Ergebnis sind semantisch reiche Darstellungen(Abstraktionen) des Systems

(c) 2006, H.Gall Reverse Engineering.6

Restructuring

• Transformation von einer Repräsentation in eineandere auf demselben Abstraktionsniveau.

• Funktionalität und Semantik des Software-Systemswird nicht verändert!

• z.B.☞ Restrukturierung des Source Codes in neue logische

Einheiten (Module)☞ Restrukturierung des Designs in veränderte

Komponenten

(c) 2006, H.Gall Reverse Engineering.7

Refactoring

• Refactoring is a technique to restructure code ina disciplined way.

• For a long time it was a piece of programmerknowledge, done with varying degrees ofdiscipline by experienced developers, but notpassed on in a coherent way

• Martin Fowler’s “www.refactoring.com”

(c) 2006, H.Gall Reverse Engineering.8

Reverse Engineering Tools

• Tool-Übersicht: http://scgwiki.iam.unibe.ch:8080/SCG/370• CodeCrawler:

http://www.iam.unibe.ch/~scg/Research/CodeCrawler/index.html

• Imagix4D: http://www.imagix.com/• Rigi: http://www.rigi.csc.uvic.ca/• XGvis: http://www.research.att.com/areas/stat/xgobi/• IBM Structured Analysis Tool:

http://www.alphaworks.ibm.com/tech/sa4j• Java Clone Detection CloneDR:http://www.semdesigns.com/Products/Clone/download.asp

(c) 2006, H.Gall Reverse Engineering.9

Re-Engineering

• Änderung des Software-Systems, um es inveränderter Form neu zu implementieren. Auchneue Anforderungen (Requirements) werdenmiteinbezogen.

• Re-Engineering := Reverse Engineering+ Δ+ Forward Engineering

(c) 2006, H.Gall Reverse Engineering.10

Reverse Engineering Terminologie

[Chikofsky/Cross, 1990]

Design ImplementationRequirementsForward

EngineeringForward

Engineering

ReverseEngineering

ReverseEngineering

DesignRecovery

DesignRecovery

Re-Engineering(renovation)

Restructuring Restructuring

Re-Engineering(renovation)

Restructuring,Redocumentation

(c) 2006, H.Gall Reverse Engineering.11

Beispiel: Multitasking Window System

• Wissen über Schlüsselstrukturen wie:☞ process table, window table, window management

module, process management module, etc.

• Suche im Domain Model nach diesen Konzeptenund Instanzierung dieser für einen architekturellenÜberblick

Process table

Window table

Process Management Module

Window Management Module

...

MultitaskingWindowManager

(c) 2006, H.Gall Reverse Engineering.12

Semantische Info im Code

#include <stdio.h>#include “h0001.h”#include “h0002.h”#include “h0003.h”

f0001(a0001)unsigned int a0001;{unsigned int i0001;f0002(g0005, d0001, d0002);f0002(a0001, d0003, d0002);f0003(g0001[a0001].s0001,g0001[a0001].s0002);g0006 = a0001;i0001 = g0001[a0001].s0003;if(! f0004(i0001) && (g0002->g0003)[i0001].s0004 == d0004)

f0005(i0001);}

(c) 2006, H.Gall Reverse Engineering.13

Semantische Info im Code /2

#include <stdio.h>#include “proc.h”#include “windows.h”#include “globdefs.h”

change_window(nw)unsigned int nw;{unsigned int pn;border_attribute(cwin, NORM_ATTR,INV, INV_ATTR);border_attribute(nw, NORMHLIT_ATTR, INV_ATTR);move_cursor(wintbl[nw].crow, wintbl[nw].ccol);cwin = nw;pn = wintbl[nw].pnumb;if(! outrange(pn) && (g->proctbl)[pn].procstate == SUSPENDED)

resume(pn);}

(c) 2006, H.Gall Reverse Engineering.14

Semantische Info im Code /3#include <stdio.h>#include “proc.h”#include “windows.h”#include “globdefs.h”

change_window(nw) /* change current window to window nw */unsigned int nw; /* number of target window */{unsigned int pn;/* restore border of current window to un-highlighted */border_attribute(cwin, NORM_ATTR,INV, INV_ATTR);/* highlight border of new current window */border_attribute(nw, NORMHLIT_ATTR, INV_ATTR);/* move physical cursor to new window where cursor was left and make nw the current window */move_cursor(wintbl[nw].crow, wintbl[nw].ccol);cwin = nw;/* resume the process associated with the new window if it is suspended */pn = wintbl[nw].pnumb;if(! outrange(pn) && (g->proctbl)[pn].procstate == SUSPENDED)

resume(pn);}

(c) 2006, H.Gall Reverse Engineering.15

Abstraction-to-code-mapping

• direktes Binden:☞Assoziation mittels linguistischer Idiome

• indirektes Binden:☞Assoziation durch Substrukturen

(c) 2006, H.Gall Reverse Engineering.16

Abstrakte Design Idiome im DM

process table

window tabletable

queue

process

data

[ pr.c | prc ] [ .? | .. | ... | .... ] [t.b | tbl ]

linguistic idiom

ProcessNumber

ProcessState

ProcessName

Location ofsaved Envir.

Locationof Process

data object idiom

...

...

...

(c) 2006, H.Gall Reverse Engineering.17

Suche mittels Linguistischer Idiome

process proctbl [MAXPROCS]; /* process table array */........typedef struct procentry /* process table entry */

{unsigned int savesp; /* save sp register */unsigned int savess; /* save ss register */unsigned int pspseg; /* PSP seg addr this proc */unsigned int windno; /* window number this proc */unsigned int procstate; /* process state */char procname[MAXPNAME+1]; /* process name */int pnum; /*process number for this entry */......} process;

(c) 2006, H.Gall Reverse Engineering.18

Substruktur Bindungen im SourceCode

process proctbl [MAXPROCS]; /* process table array */........typedef struct procentry /* process table entry */

{unsigned int savesp; /* save sp register */unsigned int savess; /* save ss register */unsigned int pspseg; /* PSP seg addr this proc */unsigned int windno; /* window number this proc */unsigned int procstate; /* process state */char procname[MAXPNAME+1]; /* process name */int pnum; /*process number for this entry */......} process;

ProcessNumber

ProcessState

ProcessName

Location ofsaved Envir.

Locationof Process

(c) 2006, H.Gall Reverse Engineering.19

Resultierendes partielles Mapping

Process table

Window table

Process Management ModuleWindow Management Module...

MultitaskingWindowManager

Processstate

Processname

Processnumber

Processtable def.

code

(c) 2006, H.Gall Reverse Engineering.20

Modell-basiertes Design Recovery

Object-OrientedReengineeringPatterns

Serge DemeyerStéphane Ducasse

Oscar Nierstrasz

www.iam.unibe.ch/~scg/OORP

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.22

Reverse Engineering Patterns• What and Why• Setting Direction

☞ Most Valuable First

• First Contact☞ Chat with the Maintainers☞ Interview during Demo

• Initial Understanding☞ Analyze the Persistent Data☞ Study Exceptional Entities

• Detailed Model Capture☞ Tie Code and Questions☞ Step through the Execution☞ Look for the Contracts

• Conclusion

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.23

What and Why ?DefinitionReverse Engineering is the process of analysing a subject system

☞ to identify the system’s components and their interrelationships and☞ create representations of the system in another form or at a higher level of

abstraction.— Chikofsky & Cross, ’90

MotivationUnderstanding other people’s code(cf. newcomers in the team, code reviewing,original developers left, ...)

Generating UML diagrams is NOT reverse engineering... but it is a valuable support tool

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.24

The Reengineering Life-Cycle

(0) req. analysis(1) model captureissues• scale• speed• accuracy• politics

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(4) program transformation

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.25

Forces — Setting Direction

• Conflicting interests (technical, ergonomic,economic, political)

• Presence/absence original developers

• Legacy architecture

• Which problems to tackle?☞Interesting vs important problems?☞Wrap, refactor or rewrite?

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.26

Setting Direction

Agree on Maxims

Set direction

Appoint aNavigator

Speak to theRound Table

Maintaindirection

Coordinatedirection

Most Valuable First

Where to start

Fix Problems,Not Symptoms

If It Ain't BrokeDon't Fix It

What not to doWhat to do

Keep it Simple

How to do it

Principles & Guidelines forSoftware project management

especially relevant forreengineering projects

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.27

Most Valuable FirstProblem: Which problems should you focus on first?Solution: Work on aspects that are most valuable to your

customer• Maximize commitment, early results; build

confidence• Difficulties and hints:

☞ Which stakeholder do you listen to?☞ What measurable goal to aim for?☞ Consult change logs for high activity☞ Play the Planning Game☞ Wrap, refactor or rewrite? — Fix Problems, not Symptoms

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.28

Forces — First Contact

• Legacy systems are large and complex☞Split the system into manageable pieces

• Time is scarce☞Apply lightweight techniques to assess feasibility and

risks

• First impressions are dangerous☞Always double-check your sources

• People have different agendas☞Build confidence; be wary of skeptics

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.29

First Contact

System experts

Chat with theMaintainers

Interviewduring Demo

Talk withdevelopers

Talk withend users

Talk about it

Verify whatyou hear

feasibility assessment(one week time)

Software System

Read All the Codein One Hour

Do a MockInstallation

Read it Compile it

Skim theDocumentation

Readabout it

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.30

Chat with the Maintainers

Problem: What are the history and politics of the legacysystem?

Solution: Discuss the problems with the systemmaintainers.

• Documentation will mislead you (various reasons)• Stakeholders will mislead you (various reasons)

• The maintainers know both the technical and politicalhistory

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.31

Chat with the Maintainers

Questions to ask:• Easiest/hardest bug to fix in recent months?• How are change requests made and evaluated?• How did the development/maintenance team evolve during

the project?• How good is the code? The documentation?• Why was the reengineering project started? What do you

hope to gain?The major problems of our work are not so much technological as sociological.

— DeMarco and Lister, Peopleware ‘99

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.32

Read all the Code in One Hour

Problem: How can you get a first impression of the quality of thesource code?

Solution: Scan all the code in single, short session.• Use a checklist (code review guidelines, coding styles etc.)• Look for functional tests and unit tests• Look for abstract classes and root classes that define domain

abstractions• Beware of comments• Log all your questions!

I took a course in speed reading and read “War and Peace” intwenty minutes. It’s about Russia.

—Woody Allen

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.33

• Solution: interview during demo- select several users- demo puts a user in a positive

mindset- demo steers the interview

Interview during Demo

Problem: What are the typical usage scenarios?

Solution: Ask the user!

• ... however☞ Which user ?

☞ Users complain☞ What should you ask ?

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.34

First Project Plan

Use standard templates, including:• project scope

☞ see "Setting Direction"

• opportunities☞ e.g., skilled maintainers, readable source-code, documentation

• risks☞ e.g., absent test-suites, missing libraries, …☞ record likelihood (unlikely, possible, likely)

& impact (high, moderate, low) for causing problems

• go/no-go decision• activities

☞ fish-eye view

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.35

Forces — Initial Understanding

• Data is deceptive☞ Always double-check your sources

• Understanding entails iteration☞ Plan iteration and feedback loops

• Knowledge must be shared☞ “Put the map on the wall”

• Teams need to communicate☞ “Use their language”

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.36

Initial Understanding

understand ⇒higher-level model

Top down

Speculate about Design

Recoverdesign

Analyze thePersistent Data

Study theExceptional

Entities

Recoverdatabase

Bottom up

Identifyproblems

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.37

Analyze the Persistent DataProblem: Which objects represent valuable data?Solution: Analyze the database schema• Prepare Model

☞ tables ⇒ classes; columns ⇒ attributes☞ candidate keys (naming conventions + unique indices)☞ foreign keys (column types + naming conventions

+ view declarations + join clauses)• Incorporate Inheritance

☞ one to one; rolled down; rolled up

• Incorporate Associations☞ association classes (e.g. many-to-many associations)☞ qualified associations

• Verification☞ Data samples + SQL statements

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.38

Example: One To One

Patientid: char(5)insuranceID: char(7)insurance: char(5)

Salesmanid: char(5)company: char(40)

Personid: char(5)name: char(40)addresss: char(60)

Patientid: char(5)insuranceID: char(7)insurance: char(5)

Salesmanid: char(5)company: char(40)

Personid: char(5)name: char(40)addresss: char(60)

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.39

Example: Rolled Down

Patientid: char(5)name: char(40)addresss: char(60)insuranceID: char(7)insurance: char(5)

Salesmanid: char(5)name: char(40)addresss: char(60)company: char(40)

Patientid: char(5)insuranceID: char(7)insurance: char(5)

Salesmanid: char(5)company: char(40)

Personid: char(5)name: char(40)addresss: char(60)

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.40

Example: Rolled Up

Personid: char(5)name: char(40)addresss: char(60)insuranceID: char(7) «optional»insurance: char(5) «optional»company: char(40) «optional»

Patientid: char(5)insuranceID: char(7)insurance: char(5)

Salesmanid: char(5)company: char(40)

Personid: char(5)name: char(40)addresss: char(60)

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.41

Speculate about Design

Problem: How do you recover design from code?Solution: Develop hypotheses and check them

• Develop a plausible class diagram and iteratively check andrefine your design against the actual code.

Variants:• Speculate about Business Objects• Speculate about Design Patterns• Speculate about Architecture

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.42

Study the Exceptional Entities

Problem: How can you quickly identify design problems?

Solution: Measure software entities and study the anomalous ones

• Use simple metrics

• Visualize metrics to get an overview

• Browse the code to get insight into the anomalies

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.43

Visualizing Metrics

Use simple metrics and layout algorithms.

(x,y) width

height colour

Visualize up to 5 metrics per node

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.44

Initial Understanding (revisited)Top down

Speculate about Design

Analyze thePersistent Data

Study theExceptional

Entities

understand ⇒higher-level model

Bottom up

ITERATION

Recoverdesign

Recoverdatabase

Identifyproblems

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.45

Forces — Detailed Model Capture

• Details matter☞Pay attention to the details!

• Design remains implicit☞Record design rationale when you discover it!

• Design evolves☞Important issues are reflected in changes to the

code!

• Code only exposes static structure☞Study dynamic behaviour to extract detailed design

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.46

Detailed Model CaptureExpose the design

& make sure it stays exposedTie Code and Questions

Refactor to Understand

Keep track ofyour understanding

Expose design

Step through the Execution

Expose collaborations

• Use Your Tools• Look for Key Methods

• Look for Constructor Calls• Look for Template/Hook Methods

• Look for Super Calls

Look for the Contracts

Expose contracts

Learn from the Past

Expose evolution

Write Teststo Understand

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.47

Tie Code and QuestionsProblem: How do you keep track of your understanding?

Solution: Annotate the code

• List questions, hypotheses, tasks and observations.

• Identify yourself!

• Use conventions to locate/extract annotations.• Annotate as comments, or as methods

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.48

Refactor to Understand

Problem: How do you decipher cryptic code?Solution: Refactor it till it makes sense• Goal (for now) is to understand, not to reengineer• Work with a copy of the code• Refactoring requires an adequate test base

☞ If this is missing, Write Tests to Understand

• Hints:☞ Rename attributes to convey roles☞ Rename methods and classes to reveal intent☞ Remove duplicated code☞ Replace condition branches by methods

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.49

Step Through the ExecutionProblem: How do you uncover the run-time architecture?

Solution: Execute scenarios of known use cases and step through thecode with a debugger

• Difficulties☞ OO source code exposes a class hierarchy, not the run-time object

collaborations

☞ Collaborations are spread throughout the code

☞ Polymorphism may hide which classes are instantiated

• Focussed use of a debugger can expose collaborations

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.50

Look for the Contracts

Problem: Which contracts does a class support?Solution: Look for common programming idioms• Look for “key methods”

☞ Intention-revealing names☞ Key parameter types☞ Recurring parameter types represent temporary associations

• Look for constructor calls• Look for Template/Hook methods• Look for super calls• Use your tools!

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.51

Learn from the Past

Problem: How did the system get the way it is?

Solution: Compare versions to discover where code was removed

• Removed functionality is a sign of design evolution

• Use or develop appropriate tools

• Look for signs of:☞ Unstable design — repeated growth and refactoring

☞ Mature design — growth, refactoring and stability

(c) Demeyer, Ducasse, Nierstrasz Reverse Engineering.52

Conclusion

• Setting Direction + First Contact⇒ First Project Plan

• Initial Understanding + Detailed Model Capture☞ Plan the work … and Work the plan☞ Frequent and Short Iterations

• Issues☞ scale☞ speed vs. accuracy☞ politics