Simian Documentation

Download Simian and extract the binary file, then use these instructions to run the Similarity Analyzer tool:

Command Line Interface

Simian's command line interface allows you to run it from a shell, shell script, or batch file, scanning a directory for all files matching a pattern.

The general form for the Java version is: java -jar simian.jar [options] [files]

The files can be specified as any regular shell glob or simply a list of files and can be mixed with the -includes option. (See below for examples.)

For example, to find all java files in all sub-directories of the current directory: "**/*.java"

To find all java files in the current directory and set the threshold to 3: -threshold=3 "*.java"

To find all C# files in the current directory: "*.cs"

To find all C and header in all sub-directories of the current directory: **/*.c **/*.h

To find all java files in two different directories: "/csharp-source/*.cs" "/java-source/*.java"

To find all java files in all sub-directories, excluding Test classes: -includes=**/*.java -excludes=**/*Test.java

To find all java files in the current directory and ignore numbers: -ignoreNumbers "*.java"

To find all Ruby files and display the results in xml format: -formatter=xml "*.rb"

To find all Ruby files and sends the results in emacs compatible format to a file: -formatter=emacs:c:\temp\simian.log "*.rb"

To read configuration from a file (where each line of the file specifies at most one of any of the valid command-line arguments): -config=simian.config

Notes

The default VM size seems to be adequate for most projects. If you encounter the following error you will need to increase the VM heap size using the -mx JVM option.

Exception in thread "main" java.lang.OutOfMemoryError

Ant Task

This method allows you to integrate Simian with Apache Ant™, a java based build tool.

Somewhere in your build.xml file, define the task:

<taskdef resource="simiantask.properties" classpath="simian.jar"/>

And finally, create a target to run the checker. For all defaults:

<simian>  
  <fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>

To exclude test classes if they exists in the same tree as the source:

<simian threshold="6">
  <fileset dir="${src.main.dir}" includes="**/*.java" excludes="**/*Test.java"/>
</simian>

To change the minimum number of lines that is considered a match:

<simian threshold="6">
  <fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>

To force the language used for processing:

<simian language="java">
  <fileset dir="${src.main.dir}" includes="**/*.*"/>
</simian>

To have the build fail one or more matches are found:

<simian failOnDuplication="true">
  <fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>

To set a build property if one or more matches are found:

<simian failureProperty="test.failure">
  <fileset dir="${src.main.dir}" includes="**/*.java"/>
</simian>

By default, Simian outputs plain text using the default Ant logger. You can override this by using the nested formatter element. The formatter takes a type (either "plain"; "xml"; "emacs"; "vs"; or "yml") and an optional filename (toFile). For example, to send output to a file:

<simian>
  <formatter type="plain" toFile="simian-log.txt"/>
</simian>

To produce XML output:

<simian>
  <formatter type="xml" toFile="simian-log.xml"/>
</simian>

You may specify any number of formatter elements allowing you to produce both XML and plain text output if necessary.


Simian Processing Options

Option Languages Default Possible values Description
formatter all none plain, xml, emacs, vs (visual studio), yaml, null Specifies the format in which processing results will be produced.
threshold all 6 integer >= 2 Matches will contain at least the specified number of lines.
language n/a none java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 Assumes all files are in the specified language
defaultLanguage n/a none java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 Assumes files are in the specified language if none can be inferred
failOnDuplication all true boolean Causes the checker to fail the current process if duplication is detected
reportDuplicateText all false boolean Prints the duplicate text in reports
ignoreBlocks all none string Ignores all lines between specified START/END markers
ignoreCurlyBraces Java, C#, C, C++, JavaScript, Ruby, Groovy false boolean Curly braces are ignored.
ignoreIdentifiers Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy false boolean Completely ignores all identfiers.
ignoreIdentifierCase Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy true boolean Matches identifiers irrespective of case. Eg. MyVariableName and myvariablename would both match.
ignoreRegions C# false boolean Ignore lines between #region/#endregion.
ignoreStrings Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy false boolean MyVariable and myvariablewould both match.
ignoreStringCase Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy true boolean "Hello, World" and "HELLO, WORLD" would both match.
ignoreNumbers Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy false boolean int x = 1; and int x = 576; would both match.
ignoreCharacters Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy false boolean 'A' and 'Z'would both match.
ignoreCharacterCase Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy true boolean 'A' and 'a'would both match.
ignoreLiterals Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy false boolean 'A', "one" and 27.8would all match.
ignoreSubtypeNames Java, C, Groovy false boolean BufferedReader, StringReader and Reader would all match.
ignoreModifiers Java, C#, C, C++, JavaScript, Groovy true boolean public, protected, static, etc.
ignoreVariableNames Java, C, Groovy false boolean Completely ignores variable names (field, parameter and local). Eg. int foo = 1; and int bar = 1 would both match
balanceParentheses Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy false boolean Ensures that expressions inside parenthesis that are split across multiple physical lines are considered as one.
balanceCurlyBraces Ruby false boolean Ensures that expressions inside curly braces that are split across multiple physical lines are considered as one.
balanceSquareBrackets Java, C#, C, C++, JavaScript, Ruby, Groovy false boolean Ensures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false.

Simian recognizes the following file extensions:

Language Extensions
Java java
C Sharp cs, c#, csharp
C c, h, m
C++ cpp, hpp, cplusplus, inl
Ruby rb, ruby
COBOL cobol
ABAP abap
XML xml, xsl, xsd
Jakarta Server Pages jsp
ASP asp
JavaScript js, javascript
HTML html, htm
Visual Basic vb, bas, cls, frm
Lisp lisp, lsp
Groovy groovy
Plain Text default when language cannot be determined