Simian Documentation
Download Simian and extract the binary file, then use these instructions to run the Similarity Analyzer tool:
Command Line Interface
Simian's command line interface allows you to run it from a shell, shell script, or batch file, scanning a directory for all files matching a pattern.
The general form for the Java version is:
java -jar simian.jar [options] [files]
The files
can be specified as any regular shell glob or simply a list of files and can be
mixed with the -includes
option. (See below for examples.)
For example, to find all java files in all sub-directories of the current directory:
"**/*.java"
To find all java files in the current directory and set the threshold to 3:
-threshold=3 "*.java"
To find all C# files in the current directory: "*.cs"
To find all C and header in all sub-directories of the current directory: **/*.c **/*.h
To find all java files in two different directories:
"/csharp-source/*.cs" "/java-source/*.java"
To find all java files in all sub-directories, excluding Test classes:
-includes=**/*.java -excludes=**/*Test.java
To find all java files in the current directory and ignore numbers:
-ignoreNumbers "*.java"
To find all Ruby files and display the results in xml format:
-formatter=xml "*.rb"
To find all Ruby files and sends the results in emacs compatible format to a file:
-formatter=emacs:c:\temp\simian.log "*.rb"
To read configuration from a file (where each line of the file specifies at most one of any of the
valid command-line arguments): -config=simian.config
Notes
The default VM size seems to be adequate for most projects. If you encounter the following error you
will need to increase the VM heap size using the -mx
JVM option.
Exception in thread "main" java.lang.OutOfMemoryError
Ant Task
This method allows you to integrate Simian with Apache Ant™, a java based build tool.
Somewhere in your build.xml file, define the task:
<taskdef resource="simiantask.properties" classpath="simian.jar"/>
And finally, create a target to run the checker. For all defaults:
<simian>
<fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>
To exclude test classes if they exists in the same tree as the source:
<simian threshold="6">
<fileset dir="${src.main.dir}" includes="**/*.java" excludes="**/*Test.java"/>
</simian>
To change the minimum number of lines that is considered a match:
<simian threshold="6">
<fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>
To force the language used for processing:
<simian language="java">
<fileset dir="${src.main.dir}" includes="**/*.*"/>
</simian>
To have the build fail one or more matches are found:
<simian failOnDuplication="true">
<fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>
To set a build property if one or more matches are found:
<simian failureProperty="test.failure">
<fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>
By default, Simian outputs plain text using the default Ant logger. You can override this by using the
nested formatter
element. The formatter
takes a type
(either
"plain"
; "xml"
; "emacs"
; "vs"
; or
"yml"
) and an optional filename (toFile
). For example, to send output to a
file:
<simian>
<formatter type="plain" toFile="simian-log.txt"/>
</simian>
To produce XML output:
<simian>
<formatter type="xml" toFile="simian-log.xml"/>
</simian>
You may specify any number of formatter elements allowing you to produce both XML and plain text output if necessary.
Simian Processing Options
Option | Languages | Default | Possible values | Description |
---|---|---|---|---|
formatter | all | none | plain, xml, emacs, vs (visual studio), yaml, null | Specifies the format in which processing results will be produced. |
threshold | all | 6 | integer >= 2 | Matches will contain at least the specified number of lines. |
language | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes all files are in the specified language |
defaultLanguage | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes files are in the specified language if none can be inferred |
failOnDuplication | all | true | boolean | Causes the checker to fail the current process if duplication is detected |
reportDuplicateText | all | false | boolean | Prints the duplicate text in reports |
ignoreBlocks | all | none | string | Ignores all lines between specified START/END markers |
ignoreCurlyBraces | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Curly braces are ignored. |
ignoreIdentifiers | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | Completely ignores all identfiers. |
ignoreIdentifierCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean |
Matches identifiers irrespective of case. Eg. MyVariableName and
myvariablename would both match.
|
ignoreRegions | C# | false | boolean | Ignore lines between #region/#endregion. |
ignoreStrings | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | MyVariable and myvariable would both match. |
ignoreStringCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | true | boolean | "Hello, World" and "HELLO, WORLD" would both match. |
ignoreNumbers | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | int x = 1; and int x = 576; would both match. |
ignoreCharacters | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | 'A' and 'Z' would both match. |
ignoreCharacterCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean | 'A' and 'a' would both match. |
ignoreLiterals | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | 'A' , "one" and 27.8 would all match. |
ignoreSubtypeNames | Java, C, Groovy | false | boolean |
BufferedReader , StringReader and Reader would all match.
|
ignoreModifiers | Java, C#, C, C++, JavaScript, Groovy | true | boolean | public , protected , static , etc. |
ignoreVariableNames | Java, C, Groovy | false | boolean |
Completely ignores variable names (field, parameter and local). Eg.
int foo = 1; and int bar = 1 would both match
|
balanceParentheses | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | Ensures that expressions inside parenthesis that are split across multiple physical lines are considered as one. |
balanceCurlyBraces | Ruby | false | boolean | Ensures that expressions inside curly braces that are split across multiple physical lines are considered as one. |
balanceSquareBrackets | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Ensures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false. |
Simian recognizes the following file extensions:
Language | Extensions |
---|---|
Java | java |
C Sharp | cs, c#, csharp |
C | c, h, m |
C++ | cpp, hpp, cplusplus, inl |
Ruby | rb, ruby |
COBOL | cobol |
ABAP | abap |
XML | xml, xsl, xsd |
Jakarta Server Pages | jsp |
ASP | asp |
JavaScript | js, javascript |
HTML | html, htm |
Visual Basic | vb, bas, cls, frm |
Lisp | lisp, lsp |
Groovy | groovy |
Plain Text | default when language cannot be determined |