Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
da142a0
new result objects
304NotModified Nov 28, 2015
0899527
Helper method
304NotModified Nov 28, 2015
db37afe
remove reset
304NotModified Nov 28, 2015
a024933
remove unused
304NotModified Nov 28, 2015
a1c5167
remove report()
304NotModified Nov 28, 2015
d836077
intermediate summary
304NotModified Nov 28, 2015
05864df
intermediate step
304NotModified Nov 28, 2015
d2ab0ca
return detectionsummary
304NotModified Nov 28, 2015
e55a115
using report
304NotModified Nov 28, 2015
4e5ef11
static
304NotModified Nov 28, 2015
1abc27c
cleanup
304NotModified Nov 28, 2015
af002cd
intermediate
304NotModified Nov 28, 2015
f15ee04
cleanup
304NotModified Nov 28, 2015
68edab0
renames
304NotModified Nov 28, 2015
efd95e4
todo added
304NotModified Nov 28, 2015
c3bf722
refactor
304NotModified Nov 28, 2015
55d531f
extract method, remove duplicate code
304NotModified Nov 28, 2015
ca89654
remove unused
304NotModified Nov 28, 2015
47685d5
cleanup
304NotModified Nov 28, 2015
7850bc2
removed universalDetector
304NotModified Nov 28, 2015
644f16a
protected fields => gone
304NotModified Nov 28, 2015
1ab718f
removed isdone()
304NotModified Nov 28, 2015
bcae1cf
cleanup
304NotModified Nov 28, 2015
b7c2aed
added todo
304NotModified Nov 28, 2015
5368fbf
refactor
304NotModified Nov 28, 2015
7491b3e
dumpstatus fix
304NotModified Nov 28, 2015
6f63289
refactor
304NotModified Nov 28, 2015
b014912
getFromFile
304NotModified Nov 28, 2015
1c852ed
remove useless interface
304NotModified Nov 28, 2015
55f265d
to static
304NotModified Nov 28, 2015
d652377
move
304NotModified Apr 7, 2017
d8c5c17
migrate to vs2017 and .net core
304NotModified Apr 7, 2017
58e0b4c
adjust namespaces
304NotModified Apr 7, 2017
9d1fdd2
cleanup
304NotModified Apr 7, 2017
a4cbc73
renames & cleanup
304NotModified Apr 7, 2017
acec9a7
move to new version
304NotModified Apr 7, 2017
c12ce6d
test also net4
304NotModified Apr 7, 2017
b5cc4f5
Update README.md
304NotModified Apr 7, 2017
272b5f0
strong name
304NotModified Apr 7, 2017
3781cf8
xml docs
304NotModified Apr 7, 2017
13651f6
fix warnings
304NotModified Apr 7, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
###############################################################################
# Set default behavior to automatically normalize line endings.
###############################################################################
* text=auto

###############################################################################
# Set default behavior for command prompt diff.
#
# This is need for earlier builds of msysgit that does not have it on by
# default for csharp files.
# Note: This is only used by command line
###############################################################################
#*.cs diff=csharp

###############################################################################
# Set the merge driver for project and solution files
#
# Merging from the command prompt will add diff markers to the files if there
# are conflicts (Merging from VS is not affected by the settings below, in VS
# the diff markers are never inserted). Diff markers may cause the following
# file extensions to fail to load in VS. An alternative would be to treat
# these files as binary and thus will always conflict and require user
# intervention with every merge. To do so, just uncomment the entries below
###############################################################################
#*.sln merge=binary
#*.csproj merge=binary
#*.vbproj merge=binary
#*.vcxproj merge=binary
#*.vcproj merge=binary
#*.dbproj merge=binary
#*.fsproj merge=binary
#*.lsproj merge=binary
#*.wixproj merge=binary
#*.modelproj merge=binary
#*.sqlproj merge=binary
#*.wwaproj merge=binary

###############################################################################
# behavior for image files
#
# image files are treated as binary by default.
###############################################################################
#*.jpg binary
#*.png binary
#*.gif binary

###############################################################################
# diff behavior for common document formats
#
# Convert binary document formats to text before diffing them. This feature
# is only available from the command line. Turn it on by uncommenting the
# entries below.
###############################################################################
#*.doc diff=astextplain
#*.DOC diff=astextplain
#*.docx diff=astextplain
#*.DOCX diff=astextplain
#*.dot diff=astextplain
#*.DOT diff=astextplain
#*.pdf diff=astextplain
#*.PDF diff=astextplain
#*.rtf diff=astextplain
#*.RTF diff=astextplain
1 change: 0 additions & 1 deletion AUTHORS

This file was deleted.

1 change: 0 additions & 1 deletion COPYING

This file was deleted.

Empty file removed ChangeLog
Empty file.
34 changes: 0 additions & 34 deletions INSTALL

This file was deleted.

13 changes: 0 additions & 13 deletions Makefile.am

This file was deleted.

35 changes: 0 additions & 35 deletions Makefile.include

This file was deleted.

Empty file removed NEWS
Empty file.
64 changes: 33 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,30 @@
[![Build status](https://ci.appveyor.com/api/projects/status/ouo7t319ixcokxer/branch/master?svg=true)](https://ci.appveyor.com/project/304NotModified/ude/branch/master)
[![Build status](https://ci.appveyor.com/api/projects/status/xr59ab52cav8vuph/branch/master?svg=true)](https://ci.appveyor.com/project/304NotModified/utf-unknown/branch/master)

<!--
[![codecov.io](https://codecov.io/github/UniversalCharsetDetector/ude/coverage.svg?branch=master)](https://codecov.io/github/UniversalCharsetDetector/ude?branch=master)
-->

Detect character set for files, steams and other bytes.

Detection of character sets with a simple and redesigned interface.

#WIP
Work in progress!
This package is based on [Ude](https://github.com/errepi/ude),
which is a port of the [Mozilla Universal Charset Detector](https://mxr.mozilla.org/mozilla/source/extensions/universalchardet/).


The interface and other classes has been resigned so it's easier to use and better object oriented design (OOD). Unit tests and CI has been added.

#About this libary
Features:

- OOD
- Moved to netstandard
- Added more unit tests
- Builds on CI (AppVeyor)
- Strong named
- Documentation added

#Docs

Ude is a C# port of [Mozilla Universal Charset Detector](http://mxr.mozilla.org/mozilla/source/extensions/universalchardet/src/).
# Docs

The article "[A composite approach to language/encoding detection](http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html)" describes the charsets detection algorithms implemented by the library.

Expand All @@ -38,37 +51,26 @@ Ude can recognize the following charsets:
* ASCII

## Platform
Windows. Linux work in progress (Mono and CoreCLR)

.NET 4.0 and .NET Standard 1.3

## Usage
### Example

```c#
public static void Main(String[] args)
{
string filename = args[0];
using (FileStream fs = File.OpenRead(filename)) {
Ude.CharsetDetector detector = new Ude.CharsetDetector();
detector.Feed(fs);
detector.DataEnd();
if (detector.Charset != null) {
Console.WriteLine("Charset: {0}, confidence: {1}", detector.Charset, detector.Confidence);
} else {
Console.WriteLine("Detection failed.");
}
}
}
```

## Other portings
The original Mozilla Universal Charset Detector has been ported to a variety of languages. Among these, a Java port:
Use the static detectX methods from `CharsetDetector`.

* [juniversalchardet](http://code.google.com/p/juniversalchardet/)
### Example

from which I copied a few data structures, and a Python port:
```c#
// Detect from File
var result = CharsetDetector.DetectFromFile("c:/myfile.txt");
Encoding encoding = result.Detected.Encoding; //or result.Detected.EncodingName
float confidence = result.Detected.Confidence; //confidence between 0 and 1
var allDetails = result.Details;
// Detect from Stream
var result = CharsetDetector.DetectFromStream(stream);
// Detect from bytes
var result = CharsetDetector.DetectFromBytes(byteArray);

* [chardet](http://chardet.feedparser.org/)
```

## License

Expand Down
34 changes: 34 additions & 0 deletions UTF-unknown.sln
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.26228.9
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "UTF-unknown", "src\UTF-unknown.csproj", "{64CA7BA7-EFD9-4475-BB66-40B187622A73}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ConsoleExample", "example\ConsoleExample.csproj", "{386C6ABF-44EA-4418-B90E-E8D21E4C2475}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "UTF-unknown.Tests", "Tests\UTF-unknown.Tests.csproj", "{1922DCC9-A45F-4627-9087-CD492BBF7F38}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{64CA7BA7-EFD9-4475-BB66-40B187622A73}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{64CA7BA7-EFD9-4475-BB66-40B187622A73}.Debug|Any CPU.Build.0 = Debug|Any CPU
{64CA7BA7-EFD9-4475-BB66-40B187622A73}.Release|Any CPU.ActiveCfg = Release|Any CPU
{64CA7BA7-EFD9-4475-BB66-40B187622A73}.Release|Any CPU.Build.0 = Release|Any CPU
{386C6ABF-44EA-4418-B90E-E8D21E4C2475}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{386C6ABF-44EA-4418-B90E-E8D21E4C2475}.Debug|Any CPU.Build.0 = Debug|Any CPU
{386C6ABF-44EA-4418-B90E-E8D21E4C2475}.Release|Any CPU.ActiveCfg = Release|Any CPU
{386C6ABF-44EA-4418-B90E-E8D21E4C2475}.Release|Any CPU.Build.0 = Release|Any CPU
{1922DCC9-A45F-4627-9087-CD492BBF7F38}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{1922DCC9-A45F-4627-9087-CD492BBF7F38}.Debug|Any CPU.Build.0 = Debug|Any CPU
{1922DCC9-A45F-4627-9087-CD492BBF7F38}.Release|Any CPU.ActiveCfg = Release|Any CPU
{1922DCC9-A45F-4627-9087-CD492BBF7F38}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal
24 changes: 0 additions & 24 deletions Ude.mds

This file was deleted.

37 changes: 0 additions & 37 deletions Ude.sln

This file was deleted.

18 changes: 0 additions & 18 deletions Ude.sln.DotSettings

This file was deleted.

Loading